In a recent talk, Yan LeCun expressed his waning interest in large language models (LLMs), advocating for a focus on understanding the physical world and developing advanced reasoning capabilities in AI systems. He introduced joint embedding predictive architectures (JEPAs) as a more effective approach for creating AI that can learn and interact with its environment, emphasizing the need for a hybrid model that combines reactive and reflective thinking for achieving artificial general intelligence (AGI).
In a recent discussion at Nvidia’s GTC 2025, Yan LeCun, a prominent figure in AI research, expressed his diminishing interest in large language models (LLMs). He highlighted that while LLMs currently dominate the AI landscape, they are primarily being refined by industry professionals who focus on marginal improvements. LeCun emphasized that more intriguing challenges lie in understanding the physical world, developing persistent memory, and enhancing reasoning and planning capabilities in AI systems. He believes that the simplistic approach of using LLMs for reasoning is inadequate and that more sophisticated methods are necessary for true advancements in AI.
LeCun introduced the concept of “world models,” which are essential for machines to effectively interact with the physical environment. He argued that relying solely on text-based models is insufficient for achieving artificial general intelligence (AGI). He pointed out that current architectures, such as transformers, are not well-suited for reasoning about the physical world, as they primarily focus on token prediction rather than understanding complex, high-dimensional data. This limitation hinders the ability of AI systems to develop accurate mental models of their surroundings.
The discussion also touched on the shortcomings of existing video prediction models, which often fail to grasp the nuances of the physical world. LeCun noted that many attempts to train systems to predict video content at the pixel level have not yielded satisfactory results. Instead, he advocates for a shift towards joint embedding predictive architectures (JEPAs), which learn abstract representations of data rather than attempting to reconstruct every detail. This approach allows for more efficient training and better understanding of the physical world.
LeCun introduced his JEPAs, which aim to create AI systems that can learn concepts similarly to humans, using fewer examples and without extensive fine-tuning. The first version of this architecture has shown promise in predicting the feasibility of video scenarios based on learned representations. He believes that this model can help bridge the gap between current AI capabilities and the more complex reasoning required for AGI. The upcoming version of JEPAs is expected to enhance these capabilities further.
Finally, LeCun discussed the importance of distinguishing between two modes of thinking: system one (reactive) and system two (reflective). He argued that current AI systems are primarily developing system one capabilities, while true AGI will require advancements in system two reasoning. He concluded that achieving AGI will likely necessitate a hybrid approach that combines various capabilities, moving beyond the limitations of LLMs and embracing more sophisticated architectures that can understand and interact with the world in a meaningful way.