Elon's Grok-3 AI Is Now The Best In The World? First Look At Grok-3

artesia · 19 February 2025 18:02

In the latest episode of “Internet Drama for Nerds,” Elon Musk critiques Sam Altman and introduces Grok-3, a new AI model from X.AI that claims to be the best in the world, achieving a high ELO score in internal benchmarks. Despite its impressive performance, questions remain about the validity of the comparisons with OpenAI’s models, particularly due to the lack of independent evaluations and rising subscription costs.

artesia · 19 February 2025 18:23

In the latest episode of “Internet Drama for Nerds,” Elon Musk takes another jab at Sam Altman, the CEO of OpenAI, regarding his attempts to transition OpenAI into a for-profit entity. Musk’s strategy involves increasing the valuation of OpenAI, complicating Altman’s plans and prompting criticism from industry veterans who suggest that Altman should focus on building a superior product instead. The discussion hints at underlying insecurities within Altman, with some commentators expressing sympathy for him. Meanwhile, Musk has announced Grok-3, a new AI model from his lab, X.AI, which claims to be the best AI model currently available.

Grok-3 was introduced during a live stream where Altman and the X.AI team showcased its capabilities. The model, codenamed “Chocolate,” has been tested on a platform called Chatbot Arena, where users can compare responses from different AI models. Grok-3 has achieved a remarkable ELO score of 1400, placing it at the top across various categories. However, the benchmarks used for comparison have raised questions about their validity, as they are primarily based on internal assessments rather than independent evaluations.

The Grok-3 series includes two main versions: the full Grok-3 and a smaller variant called Grok-3 Mini. The latter is said to perform at a level comparable to DeepSeek V3. In terms of reasoning capabilities, Grok-3 features a beta version and a mini reasoning model, with the mini version surprisingly outperforming the beta in certain benchmarks. This discrepancy has led to speculation about the effectiveness of the reinforcement learning techniques applied to the larger model, suggesting that it may not yet be fully optimized.

Despite Grok-3’s impressive performance, comparisons with OpenAI’s models, particularly GPT-3, have been contentious. Altman has indicated that OpenAI may discontinue the full GPT-3 model due to operational costs, complicating direct comparisons. While Grok-3 has shown strong results in various benchmarks, the lack of third-party evaluations means its true capabilities remain uncertain. The pricing for Grok-3 is also a point of interest, as it has recently increased from $20 to $40 per month, with additional features being rolled out.

The rapid development of Grok-3, achieved with a massive infrastructure of 100,000 GPUs, highlights the competitive landscape in AI. X.AI’s ability to set up such a large training cluster in a short time contrasts with the struggles faced by other companies. The use of Tesla Mega Packs to manage electricity demands during training further showcases innovative solutions to operational challenges. As Grok-3 establishes itself as a leading model, the AI field is poised for intense competition, with expectations for future advancements and releases from both X.AI and OpenAI.