DeepSeek R1 just got a HUGE Update! (o3 Level Model)

artesia · 30 May 2025 05:27

The video highlights a major update to the open-source DeepSeek R1 model, now called R1 V2, which has significantly improved its reasoning, inference, and coding capabilities through refined training techniques, bringing its performance closer to top-tier models like GPT-3.5 and Gemini 2.5. Despite no architectural changes, the model demonstrates notable benchmark gains and enhanced intelligence, positioning it as a competitive open-source alternative in the rapidly advancing AI landscape.

artesia · 30 May 2025 05:48

The video discusses the recent significant update to DeepSeek R1, an open-source AI model, which was initially released with limited information but has now been revealed to be a substantial upgrade. The new version, referred to as R1 V2, features improved reasoning, inference, and coding capabilities, bringing its performance closer to leading models like OpenAI’s GPT-3.5 and Gemini 2.5. Despite calling it a minor upgrade, the update has resulted in notable benchmark improvements across various tasks, demonstrating the model’s enhanced intelligence and efficiency.

The presenter highlights that DeepSeek R1’s performance has dramatically increased in multiple benchmark tests, such as Amy 2024, Amy 2025, GPQA diamond, and others, with scores rising significantly. When compared to models like GPT-3, Gemini 2.5 Pro, and others, the updated R1 now approaches or surpasses some of these in certain areas, especially in coding tasks. Artificial analysis also ranks DeepSeek as the second-best AI lab globally, emphasizing its rapid progress and the shrinking gap between open-source and closed-source models.

A key aspect of the update is that there was no change in the model’s architecture; instead, the improvements stemmed from refined reinforcement learning techniques and post-training optimizations. The model, with 671 billion parameters, now demonstrates enhanced reasoning and coding skills, matching top-tier models like Gemini 2.5 Pro in several benchmarks. The increased token usage during inference indicates that the model is capable of longer, more complex thought processes, contributing to its improved performance.

The presenter shares personal testing experiences, noting that DeepSeek R1 performs well on coding challenges, such as generating code for complex tasks like Rubik’s Cube simulations. However, some tests reveal limitations, such as incomplete or flawed outputs, especially when compared to Gemini 2.5 Pro, which consistently produces accurate results. The video also discusses the model’s inference speed, token usage, and context window size, illustrating that while DeepSeek R1 has made significant strides, it still lags behind some proprietary models in certain areas.

In conclusion, the update marks a notable advancement for DeepSeek R1, positioning it as a competitive open-source alternative to leading closed models. The improvements are primarily due to better training and optimization techniques rather than architectural changes. While the presenter expresses some disappointment with the model’s performance on specific tests, they acknowledge the rapid progress and the ongoing trend of open-source models closing the gap with their proprietary counterparts. The video ends with a reflection on the evolving AI landscape and the importance of continued development in open-source AI technology.