Apple DROPS AI BOMBSHELL: LLMS CANNOT Reason

artesia · 9 June 2025 19:05

The video highlights Apple’s recent research challenging the notion that large language models possess genuine reasoning abilities, showing they mainly rely on pattern recognition and fail on complex puzzles. Experts emphasize that current AI models are limited in reasoning, and future progress will require new architectures that incorporate understanding, memory, and other human-like traits.

artesia · 9 June 2025 19:26

The video discusses a recent groundbreaking research paper released by Apple’s machine learning team, titled “The Illusion of Thinking,” which challenges the hype surrounding large language models (LLMs) and their reasoning capabilities. Apple’s paper suggests that these advanced AI models, often portrayed as capable of human-like reasoning, primarily rely on pattern matching rather than genuine logical thinking. This revelation comes just days before Apple’s developer conference, where many expected the company to showcase impressive AI features, but instead, they issued a stark reality check about the current limitations of AI reasoning.

Apple’s experiments involved testing popular reasoning models on puzzle games, specifically variations of the Tower of Hanoi, to evaluate their problem-solving abilities across different levels of complexity. The findings revealed that these models perform well on simple and medium difficulty puzzles but fail completely on highly complex ones, with accuracy dropping to zero. Interestingly, the models initially attempt to reason but then give up as problems become too difficult, indicating they lack true logical reasoning. Apple’s experiments also showed that even when models are given explicit algorithms to solve puzzles, they still fail on complex problems, suggesting their reasoning is superficial and heavily dependent on pattern recognition.

The research sparked intense debate within the AI community. Some critics argue that the models fail not because they lack reasoning, but because they hit their token output limits, making it impossible for them to complete complex solutions. Others, including AI skeptics like Gary Marcus, interpret the findings as evidence that current models are fundamentally limited and that the AI hype around reasoning and AGI (Artificial General Intelligence) is premature. Marcus highlights that Apple’s experiments echo longstanding criticisms about neural networks’ inability to handle logical reasoning outside of their training data, emphasizing that true intelligence should combine human creativity with machine precision, not mimic human flaws.

Further analysis from experts and social media discussions suggest that the way AI models are tested—using puzzle complexity based on solution length—may be flawed. Critics argue that some puzzles, like Tower of Hanoi, are inherently easy despite their length, while others, like river crossing problems, are genuinely difficult despite shorter solutions. This misalignment in measuring difficulty could explain why models struggle with certain tasks. Overall, the consensus is that current reasoning failures are less about the models’ inability to think and more about the limitations of the evaluation methods and the models’ reliance on memorization rather than logical deduction.

The video concludes by reflecting on the broader implications of Apple’s research and the ongoing debate about AI’s future. Apple’s skepticism and rigorous testing serve as a counterbalance to industry hype, emphasizing the need for a realistic understanding of AI’s capabilities. Experts like Yan Lakana advocate for developing new AI architectures that incorporate physical understanding, persistent memory, reasoning, and even emotions—traits that current models lack. The overall message is that AI is still in its infancy regarding genuine reasoning, and future breakthroughs will likely require moving beyond pure scaling of existing models toward hybrid approaches that better emulate human-like intelligence.