In the conversation, Richard Sutton argues that large language models (LLMs) lack true intelligence because they do not learn from experience or have goal-oriented frameworks like reinforcement learning (RL), which is essential for genuine understanding and continual learning. He emphasizes that future AI progress depends on building systems that learn through interaction with the world, guided by goals and rewards, rather than relying solely on pattern prediction from static data.
In this insightful conversation with Richard Sutton, a founding father of reinforcement learning (RL) and Turing Award recipient, the fundamental differences between reinforcement learning and large language models (LLMs) in AI are explored. Sutton emphasizes that RL is about understanding and interacting with the world through experience, learning from actions and their consequences, whereas LLMs primarily mimic human language patterns without a true model of the world or goals. He argues that LLMs predict what a person might say next but do not predict actual future events or outcomes in the environment, which limits their capacity for genuine intelligence and continual learning.
Sutton critiques the notion that LLMs can serve as a good prior for experiential learning, pointing out that without a clear definition of “right” or “wrong” actions grounded in goals and rewards, LLMs lack a meaningful framework for learning from experience. He contrasts this with RL, where goals and rewards provide a basis for evaluating actions and improving behavior over time. The discussion also touches on the limitations of supervised learning and imitation, noting that natural animal learning is primarily trial-and-error and prediction-based rather than imitation-based, which challenges common assumptions about how humans and AI learn.
The conversation delves into the challenges of generalization and transfer learning in AI. Sutton highlights that current deep learning methods, including LLMs, often fail to generalize well across different tasks or states without human intervention to sculpt representations. He distinguishes between merely solving problems and true generalization, which involves applying learned knowledge flexibly to new, unseen situations. Sutton also reflects on historical AI milestones like AlphaGo and AlphaZero, framing them as scaling and refinement of existing RL techniques rather than entirely new breakthroughs, reinforcing his view that simple, general principles have consistently driven AI progress.
Looking forward, Sutton envisions a future where AI systems learn continually from experience, much like animals and humans do, rather than relying on static training data. He discusses the importance of having a goal-oriented framework and the ability to build and update rich models of the world through ongoing interaction. The conversation also touches on the sociological and philosophical implications of AI development, including the transition from biological replicators to designed intelligences, the potential for AI succession, and the ethical considerations of designing AI with robust, prosocial values to guide their behavior in an uncertain future.
Finally, Sutton offers a thoughtful perspective on the trajectory of AI research and the broader human endeavor to understand intelligence. He acknowledges the surprises and successes of recent AI developments but stresses the enduring importance of foundational principles like reinforcement learning. He encourages a balanced view of AI’s future, recognizing both its transformative potential and the challenges of control, value alignment, and societal impact. The dialogue closes with a reflection on the continuity of human values and the ongoing effort to design a positive future amid rapid technological change.