Anthropic just dropped Opus 4.5

artesia · 24 November 2025 22:52

Anthropic’s Claude Opus 4.5 is a leading AI model excelling in coding and agent tasks, outperforming competitors like Gemini 3 Pro and GPT 5.1 with improved efficiency, advanced tool use, and exceptional real-world problem-solving abilities. Despite a higher cost, its superior performance, including surpassing human engineers on challenging coding exams, establishes it as a top choice for developers, while the video also highlights Warp, a powerful AI coding agent enhancing terminal workflows.

artesia · 24 November 2025 23:16

Anthropic has just released Claude Opus 4.5, a new Frontier model that is currently the top performer in coding, agents, and computer use benchmarks. Compared to its predecessor Sonnet 4.5, Opus 4.5 shows a notable improvement in the Swebench verified coding benchmark, scoring 80.9% versus 77.2%. It outperforms recent competitors like Gemini 3 Pro, GPT 5.1, and Codeex Max in several key coding and agent benchmarks, although it does not lead in some areas such as graduate-level reasoning (GPQA Diamond), visual reasoning (MMU), and multilingual Q&A (MMLU). Despite not topping every benchmark, Opus 4.5 demonstrates strong overall performance, especially in practical coding and agent tasks.

One of the standout features of Opus 4.5 is its exceptional performance on real-world agentic tasks, such as those tested by the T2 bench. In a scenario involving airline customer service, Opus 4.5 found a creative and legitimate solution that went beyond the benchmark’s expectations, showcasing advanced reasoning capabilities. This highlights the model’s ability to think flexibly and solve problems in ways that standard benchmarks may not fully capture. Additionally, Anthropic has introduced advanced tool use capabilities, allowing the model to search and invoke thousands of tools dynamically without overloading its context window, which is a significant improvement over traditional methods that consume large amounts of tokens just to load tool definitions.

Efficiency is another major improvement with Opus 4.5. It achieves higher accuracy with significantly fewer tokens compared to Sonnet 4.5, effectively doubling the intelligence per token. This efficiency is crucial for practical applications where context window size and token usage directly impact performance and cost. Speaking of cost, Opus 4.5 is priced at $525 per million tokens, which is notably more expensive than Gemini 3 Pro. However, its superior performance and efficiency may justify the higher price for users who prioritize cutting-edge coding and agent capabilities.

Anthropic also shared an impressive statistic about Opus 4.5’s coding prowess: when given the same notoriously difficult take-home exam used to hire performance engineers, Opus 4.5 outperformed every single candidate ever hired by Anthropic within the two-hour time limit. This remarkable achievement underscores the model’s advanced reasoning and coding skills, positioning it as a powerful tool for developers and engineers. Early users and industry experts have praised Opus 4.5 as the best coding model they have ever used, highlighting its practical utility and frontier-level capabilities.

Finally, the video also introduced Warp, a sponsor and leading AI coding agent that excels in terminal-based workflows and multi-agent control. Warp scored highly on benchmarks like Terminal Bench and Swebench verified, offering a streamlined coding experience with modern UX and support for multiple large language models. Overall, Claude Opus 4.5 represents a significant leap forward in AI coding and agent technology, combining top-tier performance, innovative tool use, and improved efficiency to set a new standard in the field.