François Chollet: The ARC Prize & How We Get to AGI

merefield · 3 July 2025 14:00

François Chollet highlights that true general intelligence requires fluid adaptability and reasoning beyond memorized skills, which current large-scale models lack, as demonstrated by their poor performance on the ARC benchmark designed to test such abilities. He advocates for a new AI paradigm focused on test-time adaptation and hybrid reasoning that integrates perceptual intuition with symbolic program synthesis, aiming to develop AI systems capable of independent invention and scientific discovery.

merefield · 3 July 2025 14:20

In his talk, François Chollet emphasizes the critical role of compute cost reduction and data availability in the progress of AI, highlighting how the 2010s saw breakthroughs in deep learning due to abundant GPU compute and large datasets. However, he points out a fundamental misunderstanding in the AI community: the conflation of memorized, task-specific skills with true fluid general intelligence—the ability to adapt and solve novel problems on the fly. To address this, Chollet introduced the Abstraction and Reasoning Corpus (ARC) benchmark in 2019, designed to test fluid intelligence rather than static skills. Despite massive scaling of models like GPT-4.5, performance on ARC remained near zero, indicating that scaling alone does not lead to general intelligence.

Chollet explains that the AI field has recently shifted from a pre-training scaling paradigm to one focused on test-time adaptation, where models dynamically modify their behavior during inference to handle new tasks. This shift has led to significant progress on ARC, with models like OpenAI’s GPT-4.5 fine-tuned on ARC achieving human-level performance on the benchmark. This new era of test adaptation involves techniques such as test-time training, program synthesis, and chain-of-thought reasoning, all aimed at enabling AI systems to learn and adapt in real-time rather than relying solely on pre-trained static knowledge.

A key part of Chollet’s argument is a refined definition of intelligence as the efficiency with which past information is operationalized to handle novel future situations. He distinguishes between static skills—predefined, memorized behaviors—and fluid intelligence, which involves synthesizing new solutions on the fly. Chollet critiques traditional benchmarks and exams for measuring task-specific skills rather than true intelligence, advocating for benchmarks like ARC that emphasize novelty, abstraction, and reasoning. He also introduces ARK2 and the upcoming ARK3 benchmarks, which progressively increase the complexity and interactivity of tasks to better measure AI’s fluid intelligence and adaptive capabilities.

Chollet further elaborates on the nature of intelligence as involving two complementary types of abstraction: type one, which deals with continuous, perceptual pattern recognition (well handled by current deep learning models like transformers), and type two, which involves discrete, symbolic reasoning and programmatic abstraction (less well captured by current models). He argues that true general intelligence requires integrating both forms, combining fast, approximate intuition with precise, discrete program search. This hybrid approach mirrors human cognition, where intuition narrows down options and explicit reasoning explores them in depth.

Finally, Chollet outlines the future direction of AI research, focusing on building systems that act like programmer-like meta-learners. These systems would synthesize new programs on the fly by recombining learned abstractions, guided by deep learning-based intuition to efficiently navigate the vast space of possible solutions. His new research lab, India, is dedicated to developing such AI systems capable of independent invention and scientific discovery, aiming to accelerate human progress by creating AI that goes beyond automation to autonomous invention and knowledge expansion.