GPT 5.4 is vs Claude Opus 4.6 - Test on real code (UI Design)

artesia · 6 March 2026 08:56

The video compares ChatGPT 5.4 and Claude Opus 4.6 on real-world coding tasks, specifically UI redesign and building a flight simulator app, using intentionally vague prompts to test their ability to handle imperfect instructions. The creator finds Claude produces cleaner, more user-friendly designs, while ChatGPT tends to over-explain and create cluttered UIs, concluding that both have unique strengths and are best used together for coding and UI design tasks.

artesia · 6 March 2026 09:16

The video compares the newly released ChatGPT 5.4 with Claude Opus 4.6, focusing on their real-world coding abilities, particularly in UI design. The creator sets up a practical test by asking both models to redesign the landing page for his SaaS product using a deliberately vague prompt: “I hate the UI. It looks like AI made it. Redesign it.” He argues that advanced AI should be able to handle such imprecise instructions, as real users often provide imperfect prompts. The video also includes a secondary challenge: generating a flight simulator app in the browser with a similarly brief prompt.

The creator uses the Claude desktop app and ChatGPT’s Code Interpreter (Codeex) side by side, noting differences in their interfaces and workflow. He appreciates that Codeex provides more transparency about its process, acting more like a helpful employee by explaining its steps, while Claude’s interface is less informative about what’s happening behind the scenes. He also points out that both apps use a surprising amount of RAM, which he finds odd given that most of the processing should occur on the AI’s servers.

During the flight simulator challenge, both models generate working applications, but with notable differences. The ChatGPT version is functional but visually overwhelming and confusing, with too many numbers and a cluttered UI. Claude’s version, while similar to its previous outputs, is visually cleaner and more in line with what the creator expected from a flight simulator. He emphasizes the importance of AI understanding vague instructions, as real-world users won’t always provide perfect prompts.

For the landing page redesign, both models deliver improved UIs, but with distinct styles. Claude’s redesign is described as slick and modern, fitting well with the intended use as a writing app, though it unexpectedly changes the pricing structure. The creator notes that many websites are starting to look similar, possibly due to the influence of AI-generated designs. ChatGPT’s redesign, on the other hand, is wordy and visually overwhelming, requiring the creator to zoom out to view it comfortably. It also misses one of the original pricing points but offers some marketing suggestions, which the creator finds interesting.

In conclusion, the creator finds both models capable but with different strengths. Claude excels at producing visually appealing, user-friendly designs that closely match the intended function, while ChatGPT tends to over-explain and generate more content than necessary, sometimes at the expense of clarity. He suggests that having access to both models is beneficial, as each can offer unique insights and solutions. The video ends with an invitation for viewers to share their opinions on which model they prefer for coding and UI design tasks.