Can humans make AI any better?

The video explains how early AI systems relied on human-crafted rules, but advances in speech recognition and AI have shifted toward data-driven and computational approaches, as seen in the transition from Harpy to hidden Markov models and the rise of large language models. It argues that while current AI still depends on human knowledge, the future lies in systems that learn autonomously from experience—like AlphaGo Zero—potentially enabling AI to surpass human-derived limitations.

In 1971, ARPA (now DARPA) launched an ambitious program to advance speech recognition, aiming for a system that could recognize 1,000 words with 90% accuracy within five years. This led to the creation of Harpy by Carnegie Mellon, which used a massive, hand-crafted knowledge graph to represent the phonetic structure of spoken English. Each node in Harpy’s graph represented a phoneme, and the system matched incoming audio to these nodes using frequency analysis and a search algorithm called beam search. The grammar and pronunciation rules were meticulously designed by language experts, allowing Harpy to achieve impressive accuracy for its time, but making it difficult to scale beyond its initial capabilities.

As researchers sought to expand speech recognition to larger vocabularies, Harpy’s approach was replaced by hidden Markov models (HMMs) in the 1980s and 1990s. HMMs learned probabilities from data rather than relying on expert-crafted rules, enabling systems to scale to tens of thousands of words. This shift reflected a broader trend in AI: methods that leverage large-scale computation and data tend to outperform those that rely heavily on human knowledge and manual engineering. In 2019, computer scientist Richard Sutton articulated this trend in his influential essay “The Bitter Lesson,” arguing that general methods powered by computation ultimately win out over approaches that encode human expertise.

However, Sutton later clarified that large language models (LLMs) like GPT-2 are not straightforward examples of the “bitter lesson.” While LLMs scale with computation, they are still fundamentally trained on human-generated text, meaning they inherit the limitations of human knowledge. Sutton suggested that true breakthroughs will come from AI systems that learn directly from experience, rather than imitating what humans already know. This distinction raises the question of whether current LLMs will eventually hit a performance ceiling and need to be replaced by more autonomous, experience-driven systems.

The video draws a parallel with DeepMind’s AlphaGo and AlphaGo Zero, which achieved superhuman performance in the game of Go. AlphaGo initially learned by imitating human expert moves (supervised learning), but reached new heights by playing games against itself and learning from the outcomes (reinforcement learning). AlphaGo Zero went further, learning entirely from self-play without any human data. This approach allowed the system to discover novel strategies beyond human knowledge, demonstrating the power of reinforcement learning and value estimation in creating agents that can truly innovate.

Looking ahead, Sutton and AlphaGo lead researcher David Silver argue that we are on the cusp of an “Era of Experience,” where AI agents will learn from real-world interactions and optimize for outcomes beyond what is captured in human data. While reinforcement learning has shown promise in domains like games, math, and coding, its application to broader real-world problems remains a challenge. The video concludes by reflecting on the limitations of current LLMs and the potential for reinforcement learning or other approaches to unlock the next frontier in AI, emphasizing the importance of systems that can discover new knowledge rather than simply replicating what humans already know.