The creator exploited a loophole in GitHub Copilot’s old message-based billing system by running complex, token-heavy tasks that cost Microsoft thousands of dollars while using only a fraction of the message quota, highlighting the flaws in the pricing model. They explain that Microsoft is transitioning to a more sustainable token-based billing system to prevent abuse and ensure fair pricing amid the broader AI compute resource challenges.
The video begins with the creator reflecting on the evolution of GitHub Copilot from a simple autocomplete tool to a sophisticated AI coding assistant competing with other platforms like Claude Code and Codex. Recently, Copilot changed its pricing model from fixed message limits to a rate-limited system based on token usage, sparking user outrage and speculation that Microsoft could no longer subsidize the service. The creator, who has a history of spending large amounts of Microsoft Azure credits testing and benchmarking their AI inference speeds, decided to exploit the old billing loophole in Copilot’s message-based pricing to demonstrate how broken and easily abusable the system was, costing Microsoft thousands of dollars while only using a fraction of the allotted messages.
The video then explains the four main billing models for AI inference: subscriptions with rate limits (used by Claude Code and Codex), subscriptions with message limits (used previously by T3 Chat and Copilot), subscriptions with spend limits (more transparent dollar-based usage), and dedicated compute (renting GPUs directly). The creator highlights the problems with message-based billing, where the cost per message can vary wildly depending on the complexity and token usage of each request. This variability makes it impossible to price fairly, as some users can exploit the system by sending very expensive messages, potentially bankrupting smaller services like T3 Chat, which had to adjust its billing to avoid losses.
To illustrate the exploit, the creator shares their experience running extremely long and complex cryptography puzzles on Copilot, which caused the AI to generate millions of tokens over many hours for a single message. By automating multiple sessions and carefully crafting prompts, they were able to drive up the inference cost dramatically, reaching thousands of dollars in usage while only consuming a small percentage of their message quota. This demonstrated how the old Copilot billing model was fundamentally flawed, as it did not account for the token-based cost of inference, allowing users to get far more compute than they paid for.
The creator emphasizes that this situation is not a “rug pull” or a malicious scheme by Microsoft but rather a case of the company being slow to update its billing system to match the realities of agentic AI workflows, where models perform multiple steps and generate large token outputs per message. Microsoft’s delayed response led to a loophole that savvy users could exploit, but the company is now moving to a more sustainable token-based credit system to prevent abuse and ensure fair pricing. The video also touches on the broader compute crisis affecting AI providers, with limited GPU resources forcing companies to carefully manage usage and pricing.
In conclusion, the creator urges viewers to understand the economics behind AI billing and not to expect unlimited usage for low fixed prices. They acknowledge the frustration users feel about losing the ability to exploit the old system but stress that these changes are necessary to keep AI services viable. The video ends with an invitation for viewers to share their thoughts on the new billing models and a promise to update on how much money was ultimately spent exploiting the old Copilot plan. The overall message is a call for greater awareness of AI service costs and a more realistic approach to subscription pricing in the age of advanced agentic AI.