Extreme Token Use of Agentic AI - Computerphile

The video explains that agentic AI coding tools incur high costs due to the massive token usage required to process large contexts and repeated interactions, as each new token generation involves reprocessing extensive prior input. It highlights the inefficiencies in current token-based pricing models and suggests that more efficient AI usage, smarter caching, and alternative approaches like vector databases are needed to make agentic AI practical and cost-effective.

The video explores the concept of tokens in AI language models and why their usage, especially in agentic AI coding tools, is so costly. Tokens are essentially words or pieces of words, including spaces and punctuation, that AI models process. Different tokenizers split text differently based on frequency and language, with models typically having access to around 100,000 tokens. These tokens are converted into numerical embeddings that the AI uses to understand and generate language. While tokenization itself is straightforward, the real complexity and cost arise from how models process these tokens, especially in an autoregressive manner where each new token prediction requires considering the entire preceding context.

A key inefficiency in large language models is that every time they generate a new token, they must reprocess the entire input context plus all previously generated tokens. This means that as conversations or tasks grow longer, the computational cost increases dramatically. Although techniques like KV caching help by storing intermediate computations to avoid redundant processing, the context size still grows with each interaction. This inefficiency becomes particularly pronounced in coding agents, which often handle large files and multiple tool calls, leading to massive token counts and thus high costs.

The video illustrates this with an example of a coding agent tasked with fixing a bug. The agent starts with a system prompt and user query, then thinks through the problem, makes tool calls to read files, processes large chunks of code, and iterates multiple times before producing a fix. Each step adds thousands of tokens to the context, quickly escalating the total token count to tens of thousands for a relatively simple task. This scaling effect explains why coding agents are far more expensive to run than simple chatbot interactions, as they require processing large inputs repeatedly and generating extensive outputs.

The presenter shares a personal example using GitHub Copilot to create a Starfield screensaver, noting that even a few prompts and file reads resulted in millions of input tokens and tens of thousands of output tokens. This highlights how token usage can balloon in real-world agentic AI applications. The video also critiques current pricing models that charge per token, arguing that this incentivizes inefficient usage and can lead to unsustainable costs. The presenter suggests that while agentic AI has potential, its high token consumption makes it impractical for many companies unless there is a clear and immediate return on investment.

In conclusion, the video emphasizes the need for more efficient AI usage and better cost models. It suggests that smaller, more focused queries and code completions are currently more practical than extensive agentic AI workflows. The future may involve embedding large documents into vector databases to reduce token processing or developing smarter caching and system designs. Ultimately, the challenge is balancing the power of agentic AI with its computational and financial costs to make it viable for widespread use.