Sam Altman Finally Admits It: "We Screwed Up"

artesia · 3 February 2026 22:58

The video covers Sam Altman’s public admission that OpenAI’s latest GPT-5.2 model has declined in writing quality due to a focus on coding and reasoning, leading many users to switch to alternatives like Google Gemini and Anthropic Claude. It highlights community feedback, comparative model performance, and the broader challenge of balancing AI capabilities across different domains.

artesia · 3 February 2026 23:18

The video discusses a recent OpenAI town hall event where CEO Sam Altman publicly admitted that the latest version of ChatGPT, specifically GPT-5.2, is worse at writing than its predecessor. Altman acknowledged that the company made a mistake by focusing too heavily on improving the model’s intelligence, reasoning, and coding abilities, while neglecting its writing skills. He explained that with limited resources, OpenAI prioritized coding and engineering performance, which led to a decline in the model’s writing quality and overall user experience for non-coding tasks.

The creator shares their personal experience, stating that GPT-5.2 has been disappointing, especially for tasks involving instruction following and human-like writing. This dissatisfaction led them, and many others according to a poll, to switch to Google’s Gemini model, which they found to be more reliable for general writing and communication. The video also highlights that the AI community, particularly on platforms like Twitter, has been vocal about these issues, and for once, their criticisms have been validated by Altman’s admission.

A significant portion of the discussion centers on the current competition in the AI space, especially regarding coding capabilities. The creator notes that Anthropic’s Claude 4.5 Opus model is currently outperforming OpenAI’s GPT-5.2 in coding benchmarks, such as the SWE-bench. This shift in focus towards coding has made models like Claude more popular among software engineers and those needing advanced coding assistance, while OpenAI’s model has lost ground in general writing and communication tasks.

The video also explores the broader question of whether improving a model’s proficiency in one domain, like coding, inevitably leads to a decline in other areas, such as writing or translation. The creator points out that Anthropic’s approach, which uses “Constitutional AI” to train its models to be helpful, honest, and harmless, may contribute to their models’ well-rounded performance. In contrast, OpenAI’s reinforcement learning from human feedback (RLHF) approach might be less effective at maintaining a balance across different skills.

Finally, the creator references a review by data scientist and tech blogger Mahal Gupta, who observed that GPT-5.2 exhibits a flatter tone, worse translation, inconsistent behavior, and increased hallucinations, especially in instant mode. These issues have led to practical problems, such as sending emails with hallucinated content. While GPT-5.2 excels at math and coding, its decline in writing and general communication has caused many users to reconsider their choice of AI model. The video concludes by emphasizing the importance of maintaining a balance in AI capabilities and invites viewers to share their own experiences and opinions.