Sample Efficiency is the next step to AGI

The video argues that the next major breakthrough needed for artificial general intelligence (AGI) is improving sample efficiency—enabling AI systems to learn and generalize from far fewer examples, much like humans do. Rather than just scaling up models and data, the focus should shift to developing more efficient learning algorithms that can rapidly abstract and compress knowledge from limited information.

The video begins by addressing the so-called “scaling paradox” in artificial intelligence. While many experts have claimed that scaling up large language models (LLMs) by increasing data and compute is hitting diminishing returns, objective measurements show that AI capabilities continue to improve rapidly. The speaker points out that although traditional scaling laws suggest we should see less improvement, real-world benchmarks and AI utility are accelerating, not slowing down. This contradiction suggests that the field is missing something fundamental about how progress is being made.

The speaker explains that the confusion arises from conflating one scaling vector—such as simply increasing model size or training data—with the entire frontier of AI capability. While vanilla pre-training may be flattening in returns, overall progress is being driven by a combination of factors: architectural innovations (like mixture of experts), improved training recipes (such as RLHF and synthetic data), better use of compute at test time (like chain-of-thought prompting), and more. The rapid saturation of benchmarks like ARC-AGI demonstrates that AI systems are becoming more capable at a pace that outstrips what would be expected from scaling alone.

A key insight is that the real gap between current AI and human intelligence is not just about scale, but about sample efficiency—the ability to generalize from far fewer examples. Human brains can learn complex skills from a handful of experiences, whereas current machine learning models require millions or billions of data points. The speaker argues that this is not simply due to evolutionary “baked-in” knowledge, since humans can master entirely novel domains like calculus or programming with relatively little data. Instead, it points to a qualitative difference in how brains and machines learn.

The video highlights that the bottleneck for AI is not the amount of available data, but the ability to compress and generalize from it efficiently. Intelligence, in this view, is closely linked to compression: to compress data well, a system must build an internal model of the processes that generated the data. Recent research, such as DeepSeek’s work on optical compression and more efficient architectures, shows promising steps toward improving sample efficiency. The speaker notes that constraints on compute, power, and data availability will force the field to prioritize more efficient learning algorithms.

In conclusion, the speaker argues that the next major paradigm in AI will be sample efficiency, not just scaling up models. The focus should shift from increasing parameters and data to improving abstraction, causal modeling, and learning efficiency. Achieving rapid generalization from limited data will be crucial for progress toward AGI. While sample efficiency alone may not be sufficient for AGI, it is clearly the next big frontier for the field, likely dominating research and development for the coming years.