The video compares ChatGPT 5.4 (Codeex) and Claude Opus 4.7 on coding tasks, finding Claude superior in website design, bug detection speed, and thoroughness, while both struggled with minor feature fixes due to poor version control. Overall, Claude was favored, though both models need improvements, especially in collaborative coding and environment management.
The video compares the performance of ChatGPT 5.4 (referred to as Codeex) and Claude Opus 4.7 in handling real code tasks, focusing on three main tests: creating a visually appealing website, identifying bugs, and fixing simple yet annoying feature issues. The presenter emphasizes the importance of AI models being able to work effectively without requiring users to craft complex prompts, highlighting that ideally, the prompt creation layer should be integrated within the models themselves.
In the first test, both models were tasked with redesigning the presenter’s homepage. Claude Opus 4.7 produced a more visually appealing and structured landing page, maintaining a consistent brand kit and user-friendly layout. In contrast, Codeex’s output was cluttered and visually overwhelming, with poor design choices that made the website difficult to navigate. Claude was also notably faster in delivering results, and its interface provided better token usage visibility, although some inaccuracies were noted.
The second test involved bug detection within the codebase. Both models found different sets of bugs with no overlap, and Claude identified some false claims which it then rejected, showcasing a level of self-verification. Claude was faster in generating the bug reports and focused on revenue-critical issues, while Codeex found bugs related to backend infrastructure and testing. The presenter noted the intriguing differences in their findings but did not conclusively determine which was superior, though he leaned slightly towards Claude for its speed and thoroughness.
The third test challenged the models to fix a minor but annoying feature related to the timing of content appearance on the website. Both models struggled with this task, often interfering with each other’s work due to poor branch management and port conflicts. The presenter highlighted the lack of built-in version control and environment management in both AI systems, which led to confusion and errors. Ultimately, neither model successfully fixed the issue, and the presenter called this a fail for both.
In conclusion, the presenter favored Claude Opus 4.7 as the better AI assistant overall, citing its speed, bug detection capabilities, and superior website design output. However, he acknowledged that both models have significant room for improvement, especially in handling collaborative coding environments and minor feature fixes. He also mentioned that a new version of Codeex is expected soon, encouraging viewers to stay tuned for future updates and comparisons.