My GPT-5 First Reaction - It's smarter but there's a few things missing

artesia · 7 August 2025 19:56

The video offers a thoughtful first impression of GPT-5, highlighting its significant advancements in reasoning, context capacity, and reduced hallucinations, while noting the lack of emphasis on multimodality and agentic behavior during the launch. Despite a somewhat underwhelming live stream focused mainly on coding, GPT-5’s enhanced autonomous task performance and strategic thinking demonstrate meaningful practical improvements, though the presenter advises managing expectations for revolutionary features.

artesia · 7 August 2025 20:19

The video provides a detailed first reaction to the release of OpenAI’s GPT-5, accessed via the API but not yet available on the main interface. The presenter begins by sharing some initial criticisms, notably the lack of emphasis on multimodality during the live stream. Despite improvements in voice capabilities, there was no mention of video or image processing, which was unexpected. The live stream heavily focused on coding improvements and autonomous task performance but did not highlight agentic behavior as a primary feature, which the presenter had anticipated.

GPT-5 is described as a significant leap forward in reasoning, context size, and reducing hallucinations, with success rates for longer autonomous tasks greatly improved. The model now supports a massive context window of 400,000 tokens with enhanced attention mechanisms, allowing it to handle much larger and more complex tasks than previous versions. The presenter emphasizes that the reduction in hallucination rates and sycophancy, combined with better instruction following, are key factors that enhance the model’s usability and practical application.

The live stream itself is characterized as a somewhat underwhelming PR event aimed at a general audience, lacking excitement and depth. Much of the content focused on coding capabilities, which, while impressive, felt repetitive and could have been condensed. The presenter notes that the improvements in one-shot coding tasks and autonomous task length are impressive but not revolutionary compared to expectations. The exponential growth in the length of autonomous tasks GPT models can handle is highlighted as a significant trend, with GPT-5 capable of tasks lasting over 26 minutes at a high success rate.

The presenter shares results from their own “PLE bench” test, where GPT-5 was asked to think abstractly and strategically about a complex economic framework. GPT-5 provided insightful and nuanced feedback, identifying missing elements such as a single normative anchor, transition designs, and credible incentive structures. This contrasted with other models like Gemini and Grock, which gave more generic responses. The suggestions from GPT-5 sparked new ideas and reflections for the presenter, particularly the recommendation to clarify a normative core to stabilize and guide the framework.

In conclusion, while GPT-5 represents a clear advancement in AI capabilities, especially in reasoning, context handling, and reducing errors, it falls short of some of the presenter’s expectations regarding multimodality and agentic behavior. The live stream was seen as somewhat bland and targeted at a broad audience rather than AI specialists. Nonetheless, the model’s improvements in autonomous task performance and strategic thinking provide valuable tools and insights. The presenter encourages viewers to focus on the practical benefits of GPT-5 rather than the hype and suggests waiting for further developments before expecting revolutionary changes.