How can GPT-4.5 be So Bad?

artesia · 28 February 2025 12:31

The video critiques GPT-4.5 for being underwhelming and not significantly outperforming previous models, despite being marketed as an improvement, with benchmarks showing it lags behind even older models. The presenter raises concerns about its high cost, slow response times, and outdated features compared to newer models, ultimately questioning its value and inviting viewer feedback on their experiences.

artesia · 28 February 2025 12:51

In the video, the presenter expresses disappointment with the release of GPT-4.5, suggesting that it feels underwhelming compared to expectations. They speculate that GPT-4.5 was originally intended to be GPT-5, which would have marked a significant improvement over GPT-4. The discussion begins with an analysis of OpenAI’s blog post, which attempts to clarify the scaling methods used in large language models (LLMs). The presenter notes that while GPT-4.5 is claimed to be a larger model, there seems to be an effort to avoid direct comparisons with previous models like GPT-3 and GPT-4.

The video highlights that despite claims of improved performance, benchmarks indicate that GPT-4.5 lags behind several other models, including older ones like DeepSeek V3. The presenter points out that while GPT-4.5 may be better than GPT-4, it still falls short in various tasks, particularly in reasoning and mathematical capabilities. They suggest that the model’s performance does not justify its high cost, which is significantly more expensive than other models, raising questions about its practical utility.

The presenter also discusses the model’s knowledge cut-off date, which is October 2023, and contrasts it with newer models like Claude 3.7, which has a cut-off date a year later. They express concern that GPT-4.5 appears outdated and lacks features such as longer maximum output tokens, which are available in newer reasoning models. This leads to the conclusion that GPT-4.5 may have been in development for an extended period, resulting in a product that feels less innovative.

Despite some positive aspects, such as improved conversational abilities and less verbosity, the presenter notes that the model’s slow response times are a significant drawback. They compare the generation times of GPT-4.5 with those of GPT-4 and GPT-4 Mini, highlighting that GPT-4.5 takes considerably longer to produce results. This raises doubts about its practicality for everyday use, especially when faster and more efficient models are available.

In conclusion, the presenter invites viewers to share their thoughts on GPT-4.5 and whether they find its pricing and performance justifiable. They express skepticism about the model’s value compared to other options on the market and question whether users will be willing to pay a premium for what seems to be only marginal improvements. The video ends with a call to action for viewers to engage in the discussion and provide feedback on their experiences with the new model.