[Video Response] What Cloudflare's code mode misses about MCP and tool calling

The video responds to Cloudflare’s “code mode” approach, which advocates converting tools into TypeScript APIs for LLMs to call via generated code, improving efficiency over traditional conversational tool calls. However, it highlights that this method may struggle with real-world, nondeterministic tool outputs requiring dynamic, intermediate decision-making, suggesting a speculative decoding approach to balance efficiency and flexibility.

The video is a response to Theo or T3 GGG’s video titled “MCP is the wrong abstraction” and discusses a Cloudflare article called “Code Mode: The Better Way to Use MCP.” The core idea presented is that instead of having large language models (LLMs) call tools directly through conversational tool calls, it is more effective to convert these tools into TypeScript APIs. The LLM then writes code to call these APIs, leveraging the vast amount of real-world TypeScript code present in its training data. This approach, called “code mode,” reportedly allows agents to handle more tools and complex tasks more efficiently.

Cloudflare’s article argues that traditional tool calling is inefficient because each tool call’s output must be fed back into the LLM’s neural network before making the next call, wasting tokens, time, and energy. In contrast, when the LLM writes code to call multiple tools as APIs, it can execute the entire sequence and only return the final result, which is more efficient. The video agrees that framing tool calls as APIs is a better approach, especially since many providers already internally use APIs for their tools.

However, the video raises a critical concern about the assumption that all tool calls can be deterministically composed into a single block of code. In real-world scenarios, tool outputs are often messy and nondeterministic. For example, a location service might return different types of data or even ask clarifying questions, making it difficult to pre-plan a fixed sequence of API calls. Humans typically adjust their plans based on intermediate outputs, and this dynamic decision-making is hard to replicate if the entire sequence is executed without intermediate checks.

The video suggests that while code mode is promising, it may not fully address the complexities of real-world tool interactions where intermediate outputs influence subsequent actions. To mitigate this, the speaker proposes a speculative decoding approach where the LLM executes multiple tool calls ahead of time, records all intermediate outputs, and then reviews them to identify any suspicious or incorrect results. This method could combine the efficiency of code mode with the flexibility of dynamic decision-making, potentially speeding up the process while maintaining accuracy.

In conclusion, the video encourages viewers to read the Cloudflare article and watch Theo’s video for more insights. It emphasizes that MCP (Machine Comprehension Protocol) is not a magical solution but rather a standardized way to expose APIs. While code mode offers significant benefits, it is important to recognize its limitations in handling complex, nondeterministic tasks that require intermediate reasoning and adjustments. The video ends by thanking viewers and inviting further exploration of the topic.