Benchmarking semantic code retrieval on Claude Code — Kuba Rogut, Turbopuffer

Kuba Rogut from Turbo Puffer discusses benchmarking semantic code retrieval using Claude Code, highlighting how semantic search with vector embeddings significantly improves code retrieval accuracy and efficiency compared to traditional grep-based methods. He demonstrates this through Turbo Puffer’s Turbo Grep tool and Cursor’s Context Bench, concluding that combining semantic and grep-style search approaches offers the best results for navigating large codebases and enhancing developer productivity.

In this talk, Kuba Rogut from Turbo Puffer discusses benchmarking semantic code retrieval using Claude Code. Turbo Puffer is introduced as a serverless full-text and vector search database built on object storage, serving major AI companies. Kuba explains that Claude Code, by default, does not use semantic code search, opting instead for agentic search, which involves grepping through the file system. However, some customers like Cursor leverage semantic code search by indexing codebases into Turbo Puffer, resulting in significant performance improvements in code retrieval accuracy and user satisfaction.

Kuba highlights the benefits of semantic search through vector embeddings, contrasting it with traditional grep-based search. While grep repeatedly scans files during each session, semantic search involves an upfront cost of chunking, embedding, and indexing the codebase, creating a cached semantic representation. This cache allows faster and more efficient retrieval of relevant code chunks across multiple sessions, leading to long-term token savings and improved performance, especially when running multiple agents simultaneously.

To explore these benefits, Turbo Puffer developed a CLI tool called Turbo Grep, which parses, chunks, embeds, and uploads code to Turbo Puffer for semantic search. Kuba demonstrates how this tool integrates with Claude Code and benchmarks its performance using Cursor’s Context Bench, a human-labeled dataset that evaluates whether agents find the correct files, lines, and symbols during code retrieval tasks. The results show that semantic search significantly improves precision by reducing irrelevant file reads, although recall improvements are mixed depending on the task type.

The analysis reveals that semantic search excels at finding behaviorally related files that lack explicit keyword matches, while traditional grep performs better when keyword-based tracing is sufficient. This suggests that different tasks benefit from different search strategies, and combining both approaches can yield better overall results. Kuba notes that Claude Code’s architecture is primarily designed for grep-style search, limiting its ability to fully leverage semantic search, unlike Cursor’s Composer model, which integrates semantic search more deeply and achieves higher performance gains.

In conclusion, Kuba emphasizes the importance of providing lightweight, versatile tools that help developers efficiently locate relevant context within large codebases. He argues that while grep is simple and zero-cost, vector databases like Turbo Puffer are essential for handling complex, multimodal, or large-scale data where semantic understanding is crucial. The talk underscores the evolving role of semantic search in code retrieval and the need for hybrid approaches to optimize developer productivity and code comprehension.