Grok 4 is good ... but can it code?

merefield · 11 July 2025 13:34

Grok 4 is a powerful reasoning AI with strong planning and creative abilities but is hindered by slow speed, high costs, and poor integration, making it impractical for daily coding tasks compared to faster, more efficient models like Claude and Gemini 2.5 Pro. While it shows promise for occasional use in web queries and code planning, it currently falls short as a reliable coding assistant.

merefield · 11 July 2025 13:54

The video provides an in-depth evaluation of Grok 4, focusing on its coding capabilities. The presenter approached the Grok 4 presentation with cautious optimism, hoping it could challenge Claude, the current leading AI model in coding assistance. While Grok 4 offers a 256k context window and is priced competitively with Claude, it is primarily a reasoning model and is notably slow and expensive when used for coding tasks. This sluggishness and high cost make it impractical for daily coding use despite its intelligence and reasoning strengths.

The presenter conducted head-to-head tests between Grok 4 and Claude using various coding environments like Open Code and Opus. Claude consistently outperformed Grok 4 in speed and usability, producing functional and stylish code quickly. In contrast, Grok 4 took an excessive amount of time—sometimes over 45 minutes—and often failed to complete tasks efficiently. Although Grok 4 could generate code and even fix minor bugs, it tended to reformat entire files unnecessarily and incurred high costs for relatively small changes, highlighting inefficiencies in its current implementation.

Further testing revealed that Grok 4 struggled with integration in popular coding tools such as GitHub Copilot, which failed to run with Grok 4 due to filtering issues. The model’s slow processing and frequent errors led to multiple failed attempts during evaluations, resulting in significant cost overruns. Despite these drawbacks, Grok 4 showed promise in planning and reasoning tasks, offering useful suggestions and a helpful diff view for code changes, which the presenter appreciated. However, these strengths did not compensate for its poor performance as a daily coding assistant.

The presenter also explored Grok 4’s creative capabilities, such as generating a text-based memory cultivation adventure game concept. This demonstrated Grok 4’s ability to think creatively and conduct web searches to refine ideas, although it took considerable time to arrive at these results. Despite this, the overall user adoption of Grok 4 on platforms like Open Router remains low, possibly due to its slow speed and high cost, which diminish its appeal compared to faster, more efficient alternatives like Claude and Gemini 2.5 Pro.

In conclusion, while Grok 4 is a powerful reasoning model with potential in planning and creative tasks, it currently falls short as a practical coding assistant due to its slow speed, high cost, and integration challenges. The presenter remains hopeful for future iterations, particularly a hybrid or faster coding model from Grok that could serve as a daily driver. For now, Grok 4 is best suited for occasional use in web-based queries and planning rather than intensive coding workflows, with Claude and Gemini 2.5 Pro maintaining their positions as the preferred tools for AI-assisted coding.