The video explains that the perceived decline in Anthropic’s Claude AI performance, especially after the Opus 4.7 update, is largely due to engineering flaws in the surrounding infrastructure—such as a poorly designed harness, inefficient tokenizer changes, and problematic hardware routing—rather than the model itself. These issues have led to increased inefficiency, poorer reasoning, and user frustration, prompting the presenter to recommend considering alternative AI models until Anthropic resolves these fundamental problems.
The video discusses noticeable regressions in the performance of Anthropic’s Claude AI models, particularly with the Opus 4.7 update, which many users, including the AMD AI director, have criticized for making the model “dumber” and less effective. The presenter highlights various types of regressions such as task refusals, poorer problem-solving abilities, and context loss, which degrade the user experience. These issues are compounded by the complex architecture behind Claude, where multiple layers—including the prompt, API, harness, and different GPU hardware—interact, and any change in these layers can negatively impact performance.
A significant factor contributing to the regressions is the poorly engineered Claude Code harness, which adds unnecessary complexity and inefficiency. The harness often causes redundant API calls and pollutes the model’s context with irrelevant information, leading to wasted compute resources and degraded output quality. Benchmarks show Claude Code performing substantially worse than other harnesses, indicating that much of the perceived decline in model intelligence is actually due to the surrounding infrastructure rather than the model itself. The presenter argues that Anthropic’s engineering shortcomings are a major root cause of the problem.
Another major change affecting performance is the update to the tokenizer in Opus 4.7, which increases the number of tokens generated for the same input by about 1.35 to 1.47 times. This bloats the context size, causing the model to process more data and hit token limits faster, which can lead to more context rot and poorer responses. Additionally, Anthropic’s deployment of the 1 million token context window model by default has introduced further regressions, as this version of the model is confirmed to perform worse than smaller context window versions. This routing change likely shifts traffic away from Nvidia GPUs to other hardware like AWS Tranium and Google TPUs, which may also contribute to inconsistent behavior.
The video also covers Anthropic’s recent decision to redact the model’s internal “thinking” process from API responses to prevent distillation, which has led to a significant drop in the amount of thinking data available to the model during inference. This change correlates with measurable declines in model reasoning depth, increased refusals, and more user frustration. The presenter suggests that Anthropic’s reliance on database lookups to compensate for this missing thinking data is likely flawed due to their poor engineering practices, further degrading model quality. The overall increase in API requests and token usage despite declining output quality underscores the inefficiency and regression in the system.
In conclusion, the presenter emphasizes that these regressions are not just subjective impressions but are supported by quantitative evidence and user reports. The problems stem from a combination of rising user expectations, poor harness design, API and tokenizer changes, hardware routing complexities, and Anthropic’s engineering failures. Compared to Anthropic, other AI providers like OpenAI have maintained more stable model performance. The video ends with a strong recommendation to consider alternatives to Claude until Anthropic addresses these fundamental issues, criticizing the company for delivering a worse product despite charging premium prices.