Grok 4.20 is still deeply flawed

artesia · 19 February 2026 13:37

The video reviews Grok 4.20, noting its faster performance and innovative use of four specialized AI agents, but highlights ongoing flaws such as epistemic biases and a tendency toward safe, US-centric answers. Despite these issues, the speaker acknowledges significant improvements in AI capabilities and encourages continued feedback to further enhance model accuracy and usefulness.

artesia · 19 February 2026 14:40

The video discusses the release of Grok 4.20, highlighting that while it represents a significant improvement over previous versions, it still suffers from notable flaws. The speaker explains their method for stress-testing new AI models by presenting them with complex, real-world problems, such as chronic health issues and post-labor economics. They note that Grok 4.20 is impressively fast and utilizes four different agents with distinct personalities, which collaborate to generate answers. This parallel processing approach, inspired by established computer science principles, allows for specialization and division of labor among the agents, leading to more nuanced responses.

The speaker draws parallels between AI agent specialization and human workplace roles, emphasizing that assigning specific tasks to individual agents improves performance but also introduces blind spots. They mention that previous versions, like Grok Heavy, used more agents but were costlier to run, whereas Grok 4.20 offers a distilled, more efficient version with four agents. The speaker also describes their personal workflow of using multiple AI models—Grok, Gemini, Claude, and ChatGPT—in parallel to leverage their different strengths and perspectives, which helps in forming more comprehensive answers to complex questions.

Despite these advancements, the speaker points out persistent epistemic biases in Grok, particularly what they call “Elon epistemics,” referencing the model’s tendency to reflect certain ideological stances. They provide examples of how Grok and other AIs can be US-centric, cherry-pick evidence, and sometimes default to safe, middle-of-the-road answers rather than engaging deeply with nuanced or controversial topics. The speaker also notes that some models, like Claude, tend to hedge excessively or avoid taking positions on risky issues, while others, like Gemini, are more willing to engage in speculative reasoning.

The video also touches on the rapid pace of AI development globally, mentioning that Chinese companies are integrating advanced agentic models directly into browsers, suggesting that the West needs to keep up. The speaker observes that while current AI subscriptions may seem expensive, the capabilities they offer will likely become standard and more accessible within a year. They also highlight the progress in open-source AI, predicting that high-performing models will soon be available for local use.

Finally, the speaker shares a personal anecdote about improvements in AI epistemics, particularly in the context of gut health. They recount how, a year ago, AI models struggled to recognize concepts like dysbiosis, but now all major models correctly identify and address such issues with minimal input. This improvement demonstrates that while AI models still have flaws and biases, they are becoming more accurate and useful over time. The speaker concludes by encouraging ongoing feedback and engagement with AI developers to continue driving these improvements.