29.4% ARC-AGI-2 🤯 (TOP SCORE!) - Jeremy Berman

Jeremy Berman, leader of the ARC-AGI v2 leaderboard, discusses his novel approach of using natural language algorithm descriptions combined with powerful models like Grok 4 to solve the ARC challenge more effectively than traditional code-based methods. He emphasizes the importance of reinforcement learning, compositional reasoning, and continual knowledge refinement for advancing AI towards general intelligence, while highlighting ongoing challenges and future research directions in replicating human-like cognition.

The video features an in-depth conversation with Jeremy Berman, a research scientist at Reflection AI and the current leader of the ARC-AGI v2 leaderboard. Jeremy discusses his innovative approach to solving the ARC challenge, an IQ-like test for machines involving input-output grid transformations. Unlike previous methods that generated explicit Python programs, Jeremy’s latest solution evolves natural language descriptions of algorithms, which are more expressive and better suited for the compositional and iterative nature of ARC v2 tasks. He emphasizes that natural language allows for concise and flexible problem descriptions, although it requires a verification step since natural language instructions cannot be executed directly like code.

Jeremy explains the evolution of his approach from ARC v1 to ARC v2, highlighting the importance of balancing breadth and depth in the search for solutions. While ARC v1 required deeper iterative refinement due to the lack of internal “thinking” capabilities in earlier models like Sonnet 3.5, ARC v2 benefits from more powerful models such as Grok 4 that inherently perform domain-specific reasoning and internal revision. This shift allows for broader exploration of solution spaces with less reliance on explicit iterative loops. Jeremy also notes that different models have domain-specific strengths, with some better at code generation and others excelling in reasoning about grid transformations.

The discussion delves into the fundamental challenges and future directions of AI reasoning and general intelligence. Jeremy and the host debate whether current neural network architectures can truly emulate the symbolic and compositional reasoning capabilities of the human brain. They agree that while neural networks are powerful, issues like catastrophic forgetting during fine-tuning and the lack of compositionality remain significant hurdles. Jeremy advocates for reinforcement learning with verifiable feedback and composable model architectures that can adapt and retain knowledge without forgetting, envisioning a future where models can continually learn and refine skills dynamically.

A key philosophical point raised is the distinction between memorized knowledge and deductive reasoning. Jeremy views intelligence as the ability to build and efficiently navigate a “knowledge tree” of deductive relationships, rather than merely storing a web of facts. Reinforcement learning and reasoning help prune and structure this knowledge into coherent, causal hierarchies, enabling generalization and creativity. The conversation touches on the importance of creativity as a meta-skill that involves selecting and combining axioms to extend the knowledge tree, emphasizing that deep understanding is crucial for meaningful innovation.

Finally, Jeremy shares his excitement about ongoing research and the potential for future breakthroughs in AI reasoning. He highlights the need for new training environments that encourage models to invent and deduce novel concepts beyond their training data. The conversation concludes with reflections on the philosophical and neuroscientific aspects of intelligence and understanding, acknowledging the complexity of replicating human-like cognition but expressing optimism that advances in language models and reinforcement learning will pave the way toward artificial general intelligence. Jeremy also invites interested researchers to join Reflection AI in their mission to build open intelligence models.