DeepSeek R1 Cloned for $30?! PhD Student STUNNING Discovery

In a recent video, UC Berkeley PhD student Ja Pan successfully reproduced the “aha moment” phenomenon from the DeepSeek R1 model for just $30, demonstrating how reinforcement learning can enhance reasoning abilities in AI. By applying this concept to the Countdown game, he showed that larger models significantly outperform smaller ones in developing sophisticated problem-solving skills, highlighting the potential for efficient AI solutions through open-source collaboration.

In a recent video, a PhD student from UC Berkeley, named Ja Pan, made a remarkable discovery by successfully reproducing the “aha moment” phenomenon observed in the DeepSeek R1 model for just $30. This “aha moment” refers to a stage in the model’s training where it learns to allocate more time to problem-solving by re-evaluating its initial approaches. This behavior showcases the model’s developing reasoning abilities and highlights the potential of reinforcement learning to yield sophisticated outcomes. The video explains how this process works, particularly emphasizing the importance of having a well-defined reward function for tasks with definitive answers, such as math and logic.

The DeepSeek paper introduced the concept of the “aha moment,” where the model begins to exhibit deep thinking capabilities through reinforcement learning. The student applied this concept to the Countdown game, which involves combining numbers to reach a target number, providing a clear reward signal for the model. By using a base language model and reinforcement learning, the student demonstrated that the model could develop self-verification and search abilities autonomously, leading to the emergence of sophisticated reasoning skills.

The video details the methodology used by the student, highlighting that the model initially produced nonsensical outputs but gradually learned to revise and search for solutions independently. The results showed that larger models (1.5 billion parameters and above) performed significantly better in developing these reasoning capabilities compared to smaller models. The findings suggest that the quality of the base model is crucial for achieving the desired outcomes, and the reinforcement learning algorithm used was found to be less critical than previously thought.

Additionally, the video discusses the implications of this research for the future of AI, speculating on the potential for small models to perform complex tasks through reinforcement learning and test-time training. The idea is that these models could be tailored to specific tasks, allowing for efficient problem-solving without extensive computational resources. The student’s work exemplifies the power of open-source collaboration, as the techniques and findings can be shared and built upon by others in the field.

In conclusion, the video emphasizes the significance of this discovery in the context of AI development, particularly in enhancing reasoning capabilities through reinforcement learning. The open-source nature of the project allows for further exploration and experimentation, potentially leading to more breakthroughs in AI. The student’s achievement not only demonstrates the feasibility of reproducing advanced AI behaviors at a low cost but also opens the door for future innovations in the field.