GPT 5.4 is so cracked

artesia · 7 March 2026 03:26

The video reviews OpenAI’s new GPT 5.4 model, highlighting its major advancements in coding, multimodal tasks, and professional document handling, as well as its top-tier performance in benchmarks and large context window. However, it notes drawbacks such as a higher hallucination rate, slower speed, and less polished design compared to some competitors, making it best suited for complex tasks rather than consistently factual outputs.

artesia · 7 March 2026 03:46

OpenAI has released its latest model, GPT 5.4, which demonstrates significant advancements in AI capabilities, particularly in coding, multimodal tasks, and professional document handling. The video showcases challenging demos, such as building a fully interactive 3D digital twin of Earth, composing complex classical music, and generating detailed 3D animated scenes from images. These tasks, which previously required extensive prompting or were not possible with other top models, are now achievable with just a few prompts using GPT 5.4, highlighting its efficiency and power.

The model excels in both coding environments like OpenAI’s Codeex and the native ChatGPT interface. It can create sophisticated interactive web applications, such as a ray-traced 3D scene with adjustable parameters and a 2D platformer video game, all from concise prompts. GPT 5.4 also demonstrates strong multimodal abilities, analyzing medical images to identify lesions and consolidating multiple financial reports into comprehensive PDF documents and interactive presentations. However, the video notes that while the model is highly capable, its design and front-end aesthetics are still lacking compared to some competitors.

Benchmarking and independent evaluations reveal that GPT 5.4 is among the top-performing models in the industry. It boasts a massive 1 million token context window in Codeex, far surpassing most competitors, and achieves leading scores in knowledge work, coding, mathematics, and reasoning benchmarks. In knowledge work tasks, GPT 5.4 even outperforms human experts 70% of the time. However, the model is somewhat slower than rivals like Gemini 3.1 Pro and is more expensive, though still cheaper than some alternatives like Opus 4.6.

Despite its strengths, GPT 5.4 has a higher hallucination rate, especially in its “extra high” reasoning mode, meaning it can produce incorrect answers more frequently on certain benchmarks. While it is the best choice for complex coding, math, and research tasks, users who require consistently factual responses might prefer other models like GLM5. The video also points out that GPT 5.4’s performance varies across different independent leaderboards, sometimes ranking below previous versions or competitors depending on the evaluation criteria.

In conclusion, GPT 5.4 stands out as one of the most intelligent and capable AI models currently available, with particular strengths in agentic coding, deep reasoning, and handling large, complex tasks. However, its design output and hallucination rate leave room for improvement, and its overall “vibe” may not suit every user or application. The presenter encourages viewers to experiment with the model themselves and stay updated on AI developments through the channel’s newsletter, emphasizing the rapid pace of innovation in the field.