Kuba Rogut from Turbo Puffer argues that retrieval-augmented generation (RAG) is not dead but evolving into more sophisticated agentic search methods that combine various retrieval techniques and iterative reasoning for improved accuracy and efficiency. Highlighting Turbo Puffer’s Cursor as a successful example, he emphasizes the shift from simple vector embeddings to dynamic, context-aware retrieval processes that better meet complex AI application needs.
In this talk, Kuba Rogut, a deployed engineer at Turbo Puffer, addresses the evolving landscape of retrieval-augmented generation (RAG) and agentic search, challenging the notion that “RAG is dead.” He begins by clarifying common misconceptions around RAG, explaining that retrieval is not limited to simple vector search but encompasses various methods including full-text search and regex filtering. Agentic search, often equated with file system grepping as seen in tools like Cloud Code, involves agents using iterative and progressive reasoning to find and process relevant context, making it a more dynamic approach than traditional RAG.
Kuba highlights Cursor, one of Turbo Puffer’s early customers, as a prime example of effective agentic search implementation. Cursor indexes codebases by chunking and embedding them for semantic search, optimizing the process using Merkle trees to avoid redundant re-embedding across teams. This approach significantly improves answer accuracy—up to 24% in some models—and enhances user retention and satisfaction, demonstrating the practical benefits of semantic search in real-world applications despite seemingly modest percentage improvements.
The talk contrasts Cloud Code’s approach, which relies on grepping and does not use vector search, with Cursor’s method that involves an upfront embedding cost but offers faster, more efficient runtime queries. Cloud Code’s per-session discovery means repeated token costs for similar queries across different users and times, whereas Cursor’s indexed approach allows agents to quickly retrieve relevant information, saving tokens and time. This efficiency has led some Turbo Puffer team members to prefer Cursor for its speed and semantic understanding capabilities.
Kuba emphasizes that the simple RAG model—embedding once and passing vectors into an LLM—is becoming outdated. Modern, sophisticated use cases require agentic search, where agents perform multiple reasoning steps and selectively retrieve information through various search methods. This iterative retrieval process is more aligned with the needs of complex AI applications, enabling better performance and unlocking new product possibilities. The shift reflects a broader trend towards more nuanced and context-aware search mechanisms.
Finally, Kuba references a quote from Google’s Jeff Dean, underscoring the importance of staged retrieval over simply expanding context window sizes. Instead of needing a trillion tokens at once, the key is to efficiently narrow down to the “right million” tokens relevant to the query. Turbo Puffer’s technology supports this by managing vast amounts of embedded data and enabling precise, scalable retrieval. The talk concludes with an invitation for questions, highlighting Turbo Puffer’s commitment to advancing retrieval technologies in AI.