The Man Replacing LLMs (And He Has $1B to Prove It)

Yan LeCun, a leading AI researcher, argues that large language models like ChatGPT lack true understanding and reasoning because they only predict text patterns without grasping real-world physics or causality. To overcome these limitations, he is developing JEPA, an AI architecture trained on raw sensory data to build internal world models for genuine understanding and planning, backed by $1 billion in funding.

Yan LeCun, the Turing Award-winning AI pioneer and former Meta chief AI scientist, recently declared large language models (LLMs) like ChatGPT a dead end. Despite having built some of the most advanced LLMs, he argues that these models fundamentally lack true understanding of reality. LLMs operate by predicting the next word or token based on patterns in text data, but they do not grasp the underlying physics or causal relationships behind the concepts they discuss. For example, while GPT can say a glass will fall and break, it doesn’t understand gravity or momentum—it merely associates words that frequently appear together.

LeCun outlines five core problems with LLMs: they have no real-world understanding, cannot genuinely reason but only simulate reasoning through pattern matching, hallucinate information due to lack of causal grounding, fail at structural planning, and are hitting scaling limits where more data and compute yield diminishing returns. He believes that continuing to invest heavily in scaling LLMs is futile because these issues are structural, not just technical hurdles. Real intelligence requires models that understand the world itself, not just language describing it.

To address these limitations, LeCun is developing a new AI architecture called JEPA (Joint Embedding Predictive Architecture). Unlike LLMs that predict the next token, JEPA predicts in an abstract representation space focused on underlying structures and dynamics. This approach mimics how babies learn by observing the physical world rather than reading about it. JEPA 2, a video-based version, was trained on over a million hours of raw internet videos showing real-world interactions, enabling it to build an internal model of physical cause and effect.

JEPA 2 was then fine-tuned with just 62 hours of robot footage to learn how a robot arm moves, demonstrating remarkable generalization to new environments and objects without extensive retraining. This contrasts sharply with traditional models that require thousands of hours of labeled, environment-specific data. LeCun’s approach emphasizes learning from raw sensory experience and building a world model that can simulate and predict outcomes, enabling planning and goal-directed behavior beyond mere pattern completion.

LeCun’s new AI system consists of six modules working together: a configurator setting goals, a perception module observing the environment, a world model predicting future states, a cost module evaluating options, an actor executing actions, and short-term memory maintaining context. This system operates directly on representations of the physical world—such as position and movement—rather than language, marking a fundamental shift from LLMs. With $1 billion in funding, LeCun aims to prove that true AI intelligence requires moving beyond language models to architectures grounded in real-world understanding and reasoning.