Cut your LLM token bill in half with these 2 simple tricks

merefield · 12 April 2026 08:23

The video presents two practical methods to reduce token usage and costs when working with large language models: adopting a terse, “caveman” style of communication to minimize unnecessary language, and using the RTK (Rust Token Killer) tool to compress verbose command-line outputs into concise summaries. Together, these techniques significantly cut token consumption—by up to 40% in real-world coding tasks—helping developers save money without sacrificing code quality or productivity.

merefield · 12 April 2026 08:43

The video explores two effective techniques to reduce token usage and costs when working with large language models (LLMs) like Claude code or GitHub Copilot. The first technique involves instructing the coding agent to communicate in a terse, “caveman” style—using minimal, direct language without unnecessary explanations or fluff. This approach significantly cuts down the number of tokens consumed, as demonstrated with a simple FizzBuzz coding example where token usage dropped from 7.6K to 5.8K tokens. The presenter shares that this method has been part of his own workflow for over a year, emphasizing that while it may sound humorous, it is a practical way to save tokens without sacrificing code quality.

The second technique introduced is the use of a tool called RTK (Rust Token Killer), developed by a colleague and designed to intercept and optimize verbose command-line tool outputs that are sent to the LLM. RTK hooks into lifecycle events of coding agents, such as running tests or git commands, and rewrites these commands to produce much more concise outputs. This drastically reduces the token count associated with these operations. The presenter demonstrates how RTK compresses verbose git logs into succinct summaries, which can be applied to many CLI tools, making it a versatile solution for token efficiency.

To illustrate RTK’s impact in a real-world scenario, the presenter runs a larger coding task involving backend feature implementation for an e-shop website. Without RTK, the context window consumed about 19,100 tokens, but with RTK enabled, this dropped to 12,000 tokens—a substantial reduction that translates to significant cost savings. This example highlights how RTK can be integrated into everyday development workflows to manage token budgets effectively, especially when working on complex projects that require extensive interaction with the LLM.

The video concludes with benchmark results comparing the two methods. The caveman-style communication yielded a 24% reduction in output tokens for the FizzBuzz example, while RTK achieved nearly a 40% reduction in a more realistic backend development context. These findings underscore the value of combining simple prompt engineering with tooling enhancements to optimize token consumption. The presenter encourages viewers to adopt these strategies and share their own tips for navigating token economies in the comments.

Overall, the video provides practical, actionable advice for developers looking to cut their LLM token bills without compromising productivity. By adopting a succinct communication style and leveraging tools like RTK to streamline verbose outputs, users can significantly reduce token usage and associated costs. The presenter also offers ongoing support through a newsletter that covers AI developments, helping viewers stay informed in the rapidly evolving field of AI-assisted coding.