OpenAI just solved math

artesia · 19 July 2025 23:09

OpenAI has developed a general-purpose large language model that achieved gold medal-level performance on the 2025 International Mathematical Olympiad by solving complex problems directly from natural language under human-like conditions, marking a significant leap in AI reasoning capabilities beyond specialized models. This breakthrough, verified by former IMO medalists, demonstrates rapid progress toward human-level AI reasoning with profound implications for scientific discovery, AI safety, and the future of human-AI interaction.

artesia · 19 July 2025 23:30

OpenAI has achieved a groundbreaking milestone by attaining gold medal-level performance on the 2025 International Mathematical Olympiad (IMO) using a general-purpose large language model (LLM). This achievement is significant because the IMO is widely regarded as the most challenging and prestigious math competition globally, and for decades, surpassing human experts in this contest was seen as a key indicator of artificial general intelligence (AGI). Unlike Google DeepMind’s previous near-gold performance, which relied on specialized models like Alpha Proof and Alpha Geometry trained on synthetic data and formalized problem translations, OpenAI’s model solved the problems directly from the official natural language statements under the same time constraints as human contestants, without any external tools.

This accomplishment marks a major leap in AI capabilities, moving from narrow, task-specific superhuman performance to more general reasoning abilities. The model’s success was achieved through novel reinforcement learning techniques that overcome the traditional challenge of hard-to-verify tasks, allowing the AI to generate intricate, watertight mathematical proofs at a human expert level. The model’s distinct, concise style of reasoning and communication—different from typical chatbot outputs—reflects its efficiency and experimental nature. Researchers emphasize that this is not GPT-5, which is expected to be released later, but an experimental research model showcasing the next frontier in AI reasoning and problem-solving.

The progress in AI reasoning is also characterized by an increasing “reasoning time horizon,” meaning the length and complexity of tasks AI can handle is doubling approximately every seven months. This exponential growth has taken AI from solving simple arithmetic problems in seconds to tackling hour-long, complex Olympiad problems. The model’s ability to think deeply and efficiently, combined with advances in test-time compute and reinforcement learning, suggests that AI is rapidly approaching and even surpassing human-level performance in complex intellectual tasks, with profound implications for scientific discovery and other fields.

OpenAI’s approach contrasts with previous methods that required manual translation of problems into formal languages, highlighting a more natural and scalable way for AI to understand and solve complex problems. The model’s performance was independently verified by former IMO medalists, confirming its gold medal-level score of 35 out of 42 points, significantly surpassing DeepMind’s 28 points. This breakthrough not only demonstrates AI’s growing mastery of mathematics but also signals a fundamental shift in AI’s role in research and problem-solving, potentially accelerating scientific progress and transforming various industries.

Finally, the video discusses broader implications, including AI safety concerns related to models’ potential to “cheat” or find shortcuts in tasks, and how researchers are developing methods to detect and prevent such behavior. It also touches on cultural impacts, such as how the model’s terse and efficient communication style might influence human language and interaction patterns over time. Overall, this milestone represents a pivotal moment in AI development, indicating that general-purpose AI systems are rapidly advancing toward and beyond human expert-level reasoning, with transformative consequences expected in the near future.