Google Takes the Gold. OpenAI under fire

Google DeepMind’s Gemini Deepthink and OpenAI’s large language model both achieved gold medal-level performance at the 2025 International Mathematical Olympiad, showcasing advanced mathematical reasoning through natural language understanding. While Google’s model was officially coordinated with the IMO, OpenAI faced controversy over the timing of their announcement and relied on independent grading, highlighting differing approaches amid a competitive AI landscape advancing toward artificial general intelligence.

Google DeepMind’s Gemini with Deepthink and OpenAI’s large language model both achieved gold medal-level performance at the 2025 International Mathematical Olympiad (IMO), each scoring 35 out of 42 points by solving five of the six problems. The sixth problem remains a significant challenge, as only five human competitors managed a perfect score of 42. This milestone marks a major achievement for AI in mathematical reasoning, with both models demonstrating advanced problem-solving capabilities using natural language understanding, unlike previous AI efforts that required formal mathematical translations.

Controversy arose around the timing of OpenAI’s announcement of their results, with rumors suggesting they prematurely revealed their success before the official IMO closing ceremony, potentially overshadowing the human competitors. However, OpenAI’s Gnome Brown clarified that their announcement came after the ceremony, following a request from an IMO organizer to respect the students. Unlike Google DeepMind, which coordinated officially with the IMO, OpenAI relied on former IMO medalists to independently grade their model’s solutions, highlighting differences in how the two companies engaged with the competition.

Google’s Gemini Deepthink model is a fine-tuned version of their large language model, enhanced with novel reinforcement learning techniques that improve multi-step reasoning, problem-solving, and theorem proving. This model employs parallel thinking, exploring multiple solution paths simultaneously before arriving at a final answer. Google plans to roll out this model to trusted testers and subscribers of their Google AI Ultra plan, signaling broader access to these advanced capabilities. The exact computational costs and token usage remain undisclosed, but the models are noted to “think for a long time,” indicating significant resource investment.

Experts emphasize that the true breakthrough lies not just in the models themselves but in the reinforcement learning systems and training pipelines within these AI labs. These systems act like “gyms” or “universities” for large language models, enabling continuous improvement through self-play, self-verification, and curriculum generation without relying heavily on human-curated data. This approach echoes the AlphaZero lesson, where AI learns autonomously through synthetic data and iterative training, representing a major leap toward artificial general intelligence (AGI).

The broader AI community views these achievements as impressive yet increasingly expected milestones, with predictions of AI winning gold at the IMO becoming more common. While the models demonstrate remarkable mathematical reasoning, discussions continue about the implications for AI development, transparency, and the evolving nature of intelligence. The debate over OpenAI’s announcement timing and the differences in model communication styles add nuance to the narrative, underscoring the dynamic and competitive landscape of AI research today.