Coding with Opus 4.6 and Codex 5.3 is actually insane

artesia · 10 February 2026 00:20

The video compares Anthropic’s Claude Opus 4.6 and OpenAI’s GPT-5.3 Codeex on real-world coding tasks, finding that Opus 4.6 produces more reliable and guideline-compliant code, especially in larger projects, while Codeex 5.3 is faster, cheaper, and excels in TypeScript but sometimes over-engineers solutions. Both models show notable improvements over their predecessors, and the creator recommends choosing based on specific project needs and coding environments.

artesia · 10 February 2026 00:40

The video reviews and compares two newly released AI coding models: Anthropic’s Claude Opus 4.6 and OpenAI’s GPT-5.3 Codeex. The creator tests both models in real-world coding scenarios across TypeScript, Rust, and Java codebases, focusing on practical coding tasks rather than relying solely on benchmarks. While both companies claim industry-leading performance based on benchmarks, the creator expresses skepticism about such metrics, preferring hands-on evaluation. Notably, both models were released within minutes of each other, and each brings incremental improvements over their predecessors.

In the TypeScript test, the creator tasks both models with fixing a persistent bug in a React-based project. Opus 4.6 successfully resolves the issue using a more imperative approach, while Codeex 5.3 produces cleaner, more idiomatic React code that ultimately fails due to a misunderstanding of browser behavior. When asked to implement password-protected links, both models deliver functional solutions, but Codeex 5.3 adds unnecessary complexity by defending against password formats that don’t exist in the codebase, while Opus 4.6 opts for a simpler, more direct approach.

Moving to Rust, the creator uses the Zed code editor’s agent protocol to test both models on a real, multi-million-line codebase. Both models devise similar plans to address an indentation issue, but Opus 4.6 produces safer, guideline-compliant code that compiles on the first try, while Codeex 5.3 is faster but introduces unrelated changes and ignores coding guidelines. In a second Rust task involving configurable letter spacing, all models produce working solutions, but Opus 4.6’s implementation is more comprehensive, while Codeex 5.3 prioritizes non-breaking changes and clean pull requests.

The creator also briefly mentions testing the models on Java codebases, noting similar trends: Opus 4.6 generally produces better code, especially in larger or more complex projects, while Codeex 5.3 excels in TypeScript environments and is notably faster and more cost-effective. Both models demonstrate significant improvements in reducing the number of back-and-forth interactions needed to complete tasks compared to their previous versions, though neither represents a revolutionary leap.

In conclusion, the creator finds both Claude Opus 4.6 and GPT-5.3 Codeex to be excellent, incrementally improved coding assistants. Opus 4.6 is preferred for its reliability and code quality, especially in larger or more complex codebases, though it is slower and less conversational than its predecessor. Codeex 5.3 is faster, cheaper, and particularly strong in TypeScript, but sometimes over-engineers solutions or ignores project guidelines. Ultimately, the choice between the two depends on the user’s specific needs and coding environment, and both are recommended for developers seeking advanced AI coding assistance.