ChatGPT 5.5 vs Claude Opus 4.7 - Test with one-shot prompts

artesia · 30 April 2026 22:14

The video compares Claude Code 4.7 and Codex (ChatGPT 5.5) using one-shot prompts to create a 3D racing game and a landing page, finding that Codex excels in gameplay mechanics and creative problem-solving while Claude Code delivers superior visual design and faster, more detailed design outputs. It also highlights that prompt length affects output quality differently for each model, with Codex being slower but more organized and reliable, and Claude Code being faster but sometimes less functional, suggesting that combining their strengths could yield the best results.

artesia · 30 April 2026 22:34

The video compares the performance of two AI models, Claude Code 4.7 and Codex (ChatGPT 5.5), using one-shot prompts of varying lengths to create a 3D racing game and a landing page. The main goal is to determine whether prompt length affects output quality and which model performs better overall. The presenter tests both a detailed, verbose prompt and a concise one-liner for each task, running them side by side to evaluate the results in terms of functionality, design, and speed.

For the 3D racing game, Claude Code’s output had a visually appealing synthwave aesthetic but suffered from poor game mechanics, such as unresponsive controls and confusing behavior. In contrast, Codex produced a game with much smoother mechanics, including proper acceleration, deceleration, and a bending road, although its automatic turning was somewhat odd. When comparing the one-shot prompts, the simpler prompt surprisingly yielded better gameplay mechanics in Claude Code, while Codex’s version was less polished but showed some creative ideas. Overall, Codex won in gameplay quality, while Claude Code excelled in visual style.

The video also highlights differences in user experience between the two platforms. Codex offers better organization with collapsible folders and a cleaner chat interface, whereas Claude Code’s interface is cluttered and harder to navigate. Additionally, Codex tends to take longer to complete tasks but performs internal checks using tools like headless Chrome, which can improve output reliability. Claude Code, on the other hand, is faster but sometimes produces incomplete or less refined results.

When it comes to generating landing pages, Claude Code outperformed Codex with the verbose prompt, producing a more visually appealing and smoothly animated site that better matched the requested magazine-like style for a focus app. Codex’s landing page had issues with layout and overflowing content, making it less polished. However, with the one-shot prompt, Codex created a landing page that looked more like a real website with a strong hero image, while Claude Code’s output resembled many of its previous, somewhat generic designs. The presenter suggests that combining the best elements from both models could yield an ideal result.

In conclusion, the video suggests that prompt length does impact output quality, especially for Claude Code, where the one-shot prompt sometimes produced better results than the verbose one. Codex generally excels in game mechanics and creative problem-solving but can be slower and less organized. Claude Code is faster and better at following detailed instructions for design tasks but may produce less functional code. The choice between the two depends on the user’s priorities, and the presenter invites viewers to share their preferences and experiences with these AI tools.