ChatGPT 5.4 is vs Claude Opus 4.6 - Test on real code

The creator compares ChatGPT Codex 5.4 and Claude Code Opus 4.7 on real coding tasks, finding Claude faster and better at bug detection but producing poorer design and struggling with branch management, while Codex delivered a more polished redesign but was slower and less effective in bug finding. Ultimately, Claude Code Opus 4.7 was deemed superior overall, though both models showed limitations, especially in project management, with a new Codex version anticipated soon.

In this video, the creator tests and compares two AI coding assistants, ChatGPT Codex 5.4 Extra High and Claude Code Opus 4.7 Extra High Effort, using real code from his SaaS website. The evaluation focuses on three main tasks: redesigning the landing page to improve user experience, identifying bugs in the codebase, and fixing a small but annoying feature related to page load timing. The creator emphasizes that these AI models should ideally work effectively without requiring complex prompts, as the goal is to have them operate autonomously.

For the first test, both AI models were tasked with creatively redesigning the homepage. Claude Code completed its task faster but produced a visually unappealing and cluttered design that the creator described as “dreadful.” In contrast, Codex took longer but delivered a more coherent and aesthetically pleasing layout that better aligned with the existing brand kit. Despite some imperfections, Codex’s output was considered closer to the desired result, while Claude’s version was criticized for ruining the website’s look.

The second test involved bug detection, where both models analyzed the codebase and created separate branches to report their findings. Claude Code found five bugs, mostly related to payment processing, and also identified and rejected six false claims. Codex found four bugs focusing on backend infrastructure and testing issues. Interestingly, there was no overlap between the bugs each AI found, highlighting their different approaches and areas of focus. Claude Code was notably faster in generating its bug report, but the creator remained uncertain about which model was more accurate overall.

In the third test, the AI models were asked to fix a timing issue where page elements appeared too early during scrolling. Both models struggled with this seemingly simple fix, often interfering with each other by working on the same main branch and causing conflicts. The creator noted that neither AI handled branching and port management well, leading to confusion and errors. This highlighted a broader issue with AI coding assistants lacking built-in project management capabilities, which are essential for smooth collaborative development.

Ultimately, the creator concluded that Claude Code Opus 4.7 outperformed Codex 5.4 by a significant margin, especially in speed and bug detection, despite some flaws in design output and branch management. He also mentioned that a new Codex model is expected soon, encouraging viewers to subscribe for updates. The video ends with a call for viewer feedback on which AI they prefer and an invitation to follow the channel for future comparisons and improvements in AI-assisted coding.