Best AI coding Agents with some crazy upsets | GPT 5, Grok Code Fast, Claude, Qwen 3 Coder

merefield · 1 September 2025 18:30

The video reviews the latest AI coding agents like GPT-5, Grok Code Fast, Claude, and Qwen 3 Coder, highlighting their strengths, weaknesses, and performance across complex coding tasks, with GPT-5, Quint 3 Coder, and Claude Sonnet 4 emerging as top contenders. It also discusses challenges in testing, the importance of model knowledge and configuration, and anticipates continued advancements in speed, cost-efficiency, and coding accuracy.

merefield · 1 September 2025 18:52

The video provides an extensive overview of the latest developments and evaluations of AI coding agents in August, highlighting major players like GPT-5, Grok Code Fast, Claude, and Qwen 3 Coder. The presenter conducted a large number of tests focusing on instruction following, unit testing, linting, and static code analysis across complex multi-file projects. They emphasize that model knowledge and configuration are crucial for performance, noting that while GPT-5 excels at following instructions, it can struggle with unfamiliar frameworks or languages. Grok Code Fast, despite its speed and affordability, often falters in error handling and tool calling, leading to less reliable code fixes.

Several new agents were introduced, including Kira, Coder, and Augment CLI. Kira, a VS Code clone with per-request pricing, showed reasonable performance but nothing groundbreaking. Coder, likely powered by Qwen 3 Coder, performed decently but lacked transparency in pricing and model selection. Augment CLI, a newcomer, surprisingly topped the newcomer chart and showed solid results, especially with Sonnet 4. Claude Code Router, once a top contender, has seen a decline in scores, possibly due to token conservation strategies or other optimizations, placing it lower in the rankings compared to previous months.

Among the tested models, Quint 3 Coder, GPT-5, and Claude Sonnet 4 emerged as the leading contenders, with Opus 4.1 also showing strong planning and debugging capabilities despite its high cost. Warp was a surprising standout, significantly improving its performance and even achieving the highest scores in some tests, particularly with Opus 4.1. The presenter noted that while many agents now perform similarly well, differences in cost, speed, and model knowledge are becoming the deciding factors for users.

The presenter also discussed the challenges faced during testing, such as environment-related issues causing some agents to fail or quit prematurely, and the complexity of configuring models in various agents. Root Code and Open Code were praised for their ease of configuration and flexibility, with Open Code being especially useful for testing multiple providers. GitHub Copilot surprisingly performed well with Grok Code Fast, despite initial doubts about compatibility. The presenter expressed a preference for CodeX and Root Code for complex bug fixing and daily coding tasks.

In conclusion, the video highlights the rapid evolution and convergence of AI coding agents, with a few standout models dominating the landscape. The presenter plans to continue refining their evaluation suite, aiming to open source parts of it and focus on more complex, automated testing. They encourage community feedback and discussion to improve testing methodologies. Looking ahead, the main models to watch are GPT-5, Claude Sonnet, and Quint 3 Coder, with ongoing experimentation on local models and anticipation of further improvements in speed, cost-efficiency, and coding accuracy.