Google Gemini AI New Limits Based on Compute - AI is Expensive

artesia · 30 May 2026 17:00

The video explains Google’s shift to compute-based usage limits for its Gemini AI to more accurately reflect the varying computational costs of different AI models, highlighting the challenges in measuring AI usage through traditional metrics like tokens, money, or power consumption. It also discusses the complexities and customer frustrations around AI billing models, emphasizing the need for clear, predictable pricing amid the evolving and resource-intensive nature of AI technology.

artesia · 30 May 2026 17:20

The video discusses Google’s recent shift in how it measures usage limits for its Gemini AI, moving from a fixed number of requests to compute-based usage limits. This change reflects a broader industry trend, similar to GitHub Copilot’s move to token-based AI credits. The speaker highlights the complexity of measuring AI usage, noting that traditional units like tokens or requests don’t fully capture the resource consumption because different AI models vary greatly in size and computational demand. For example, running a large 120 billion parameter model consumes far more compute power than a smaller 350 million parameter model, even if both use tokens as a unit of measurement.

The speaker critiques common metrics used to quantify AI costs, such as money spent or power consumption. He points out that money is an unreliable metric because local models can run on personal hardware, avoiding cloud costs. Power consumption is also problematic because different hardware components, like SSDs versus platter drives, have vastly different energy efficiencies. Additionally, newer generations of GPUs are more power-efficient, making wattage an inconsistent measure of AI usage. These nuances complicate how companies should bill for AI services and manage resource allocation.

Google’s new compute-based limits for Gemini aim to better reflect the actual computational resources used, considering factors like prompt complexity, use of advanced features (image/video generation, deep thinking models), and chat length. Paid plans offer higher usage limits, but the exact limits remain vague, causing uncertainty for users and businesses. The speaker warns that frequent changes to billing rules and limits can frustrate customers and complicate IT management, especially for organizations integrating AI into workflows with many users who might quickly exhaust their quotas.

The video also touches on the challenges of customer billing and managing expectations. The speaker shares an anecdote about client access licenses (CALs) in Microsoft server environments to illustrate how customers often misunderstand technical constraints and resist paying for necessary resources. He emphasizes the importance of clear, predictable pricing models to avoid conflicts and ensure customer satisfaction. This analogy underscores the difficulties AI providers face in designing fair and comprehensible billing systems amid rapidly evolving technology and usage patterns.

In conclusion, the speaker expresses skepticism about the current state of AI commercialization, describing it as chaotic and immature. He encourages viewers to think critically about how AI usage should be measured and billed, acknowledging that tokens, money, and power consumption each have limitations. The video ends with an invitation for viewers to share their experiences and opinions on AI resource management, highlighting the ongoing struggle to balance technological innovation with practical business considerations.