The RL Irony in LLMs (And its insane new Meta)

The video explains that while reinforcement learning (RL) helps large language models (LLMs) perform complex tasks, it is not the key to achieving artificial general intelligence (AGI) and is best used alongside supervised fine-tuning. It highlights the breakthrough of combining RL with LoRA (Low-Rank Adaptation) for efficient, personalized model adaptation, predicting this approach will drive the next wave of AI development.

The video explores the current role and limitations of reinforcement learning (RL) in advancing large language models (LLMs) toward artificial general intelligence (AGI). Referencing recent interviews with Andrej Karpathy and others, the creator notes that leading experts believe AGI is still a decade away and that RL is not the key to achieving it. While RL has enabled LLMs to perform complex tasks like web browsing, coding, and business management, it introduces significant noise during training because feedback is only provided at the end of an output sequence, unlike next-token prediction, which offers dense, step-by-step feedback. This makes RL less efficient for general learning, though it excels at enabling exploration and outcome-based optimization.

Despite RL’s strengths in teaching behavioral skills and enabling models to self-correct, it comes with trade-offs. RL tends to make LLMs less diverse and less generalizable, as it sharpens the model’s focus on specific tasks at the expense of broader capabilities. The video argues that while RL can unlock impressive performance on complex, long-horizon tasks, it is not a general solution for achieving AGI. Instead, supervised fine-tuning remains foundational, with RL serving as a specialized tool for certain applications. The speaker humorously suggests that relying solely on RL could lead to models that are essentially collections of “if statements” rather than truly intelligent systems.

A significant portion of the video is dedicated to discussing LoRA (Low-Rank Adaptation), a technique for efficiently fine-tuning LLMs by updating only a small subset of parameters. Recent research, including a blog post titled “LoRA Without Regret,” demonstrates that LoRA can match the performance of full fine-tuning in RL settings, provided certain conditions are met—such as applying LoRA to all model layers and using higher learning rates. LoRA’s efficiency stems from its ability to absorb the sparse, low-capacity signals typical of RL training, making it ideal for post-training on specific tasks rather than general pre-training.

The combination of RL and LoRA is presented as a breakthrough for both research and practical deployment. This pairing allows for rapid, cost-effective experimentation and enables scalable personalization of LLMs. Because LoRA adapters are small and modular, they can be easily swapped in and out, paving the way for highly personalized AI agents tailored to individual users or tasks. The video predicts that this approach will drive the next wave of AI development, with 2025 being the year of AI agents and 2026 the year of performative, personalized agents.

In conclusion, while RL may not be the ultimate path to AGI, its practical benefits—especially when combined with LoRA—are already transforming the capabilities and accessibility of LLMs. The speaker encourages viewers to embrace these advances, even if they fall short of true general intelligence, and promotes their own educational resources for those interested in learning more about LLMs and LoRA. The video ends with acknowledgments to supporters and an invitation to explore further content on the topic.