Grok 4 - 10 New Things to Know

artesia · 10 July 2025 14:11

The video reviews Grok 4, XAI’s latest AI language model, highlighting its strong benchmark performance, collaborative “Heavy” version, and vast knowledge base while cautioning against overhyped claims and high subscription costs. It also discusses upcoming AI developments, Grok 4’s limitations, and its potential to democratize advanced AI capabilities despite ongoing challenges like hallucinations and operational expenses.

artesia · 10 July 2025 14:31

The video provides an insightful overview of Grok 4, the latest AI language model from XAI, highlighting its impressive capabilities and the surrounding hype. Grok 4 has demonstrated remarkable performance on various benchmarks, outperforming leading models from OpenAI, Google, and Anthropic in areas such as high school math competitions, science Q&A, and coding tests. Elon Musk has even claimed that Grok 4 is smarter than almost all graduate students across disciplines, though the presenter cautions that this claim applies mainly to academic questions and should be taken with context, as real-world expertise involves more than just answering benchmark questions.

Despite Grok 4’s strong benchmark results, the video warns viewers to interpret these with caution. The comparisons often exclude certain competing models, and the visual performance of Grok 4 appears weaker than its text-based abilities. The presenter also introduces the concept of Grok 4 Heavy, a version that uses multiple AI agents working in parallel to solve problems collaboratively, a method similar to the previously released Smart GPT. While Grok 4 Heavy shows improved performance, especially on complex tasks, the high subscription cost of $300 per month raises questions about its value compared to more affordable alternatives like Gemini Pro.

The video touches on Grok 4’s social and spatial reasoning skills, noting that while it excels in many logic puzzles and social intelligence tests, it can still struggle with spatial reasoning and sometimes takes a long time to respond. Additionally, Grok 4’s knowledge base is vast, enabling it to perform well on knowledge-intensive benchmarks like “humanity’s last exam,” but this also means its success depends heavily on the training data it has been exposed to. The presenter emphasizes that Grok 4 is not a new AI paradigm and remains prone to hallucinations and errors typical of language models.

Looking ahead, the video discusses upcoming developments, including the imminent release of Grok 5 and competing models like Gemini 3 and GPT-5. Musk’s comments on AI safety reflect a cautious optimism, acknowledging potential risks but expressing a desire to witness the technology’s evolution firsthand. The video also highlights some quirky behaviors in Grok 4, such as politically charged responses influenced by system prompt changes, and notes the enormous operational costs XAI faces, including plans for a massive GPU-powered data center.

In conclusion, the video underscores Grok 4’s potential to democratize access to advanced AI capabilities, making complex scientific and coding tasks more accessible to individuals. While it may not yet be capable of generating groundbreaking new scientific discoveries independently, its ability to assist users in leveraging existing knowledge is significant. The presenter encourages viewers to remain critical of hype and pricing while recognizing the rapid progress XAI has made. The video ends on a hopeful note about the future impact of Grok 4 and its successors, inviting ongoing discussion and exploration of this evolving technology.