Building Agentic RAG From Scratch in Pure Python

merefield · 10 May 2026 09:57

In the video, Dave Abalar demonstrates building an agentic Retrieval-Augmented Generation (RAG) system from scratch using pure Python, employing iterative search and read tools within a feedback loop to refine and improve information retrieval from local markdown files. He also covers integrating these tools with the Pyantic AI framework, enhancing transparency with streaming logs, structuring outputs with citations, and discusses production-ready considerations including efficient file searching and deployment strategies.

merefield · 10 May 2026 10:18

In this video, Dave Abalar, an experienced AI engineer, demonstrates how to build an agentic Retrieval-Augmented Generation (RAG) system from scratch using pure Python. Unlike traditional semantic RAG systems that make a single call to a large language model (LLM) after retrieving all relevant information, agentic RAG employs a feedback loop where the agent iteratively uses search and read tools to refine its results. This approach allows the system to self-correct and improve the accuracy of the information it provides, making it particularly useful for integrating private or company-specific data with AI automation.

The tutorial begins with defining three fundamental tools: listing files, searching files using regular expressions, and reading files. Dave uses simple markdown files stored locally as the knowledge base and explains how to use Python’s pathlib and regex libraries to implement these tools. The listing tool uses glob patterns to find markdown files, the search tool (grab) scans files line-by-line for matching patterns, and the read tool safely reads file contents while ensuring the file is within the designated directory. These tools form the core primitives that enable the agent to navigate and extract relevant information from the knowledge base.

Next, Dave integrates these tools into an agentic loop using the Pyantic AI framework, which simplifies the interaction between the LLM and the tools. He demonstrates how the agent can answer questions by iteratively calling the tools multiple times, refining its search and reading results. To improve transparency and debugging, he introduces a streaming steps approach that logs each tool call and its parameters, allowing developers to see exactly what the agent is doing behind the scenes. This insight is crucial for optimizing the system and ensuring it retrieves accurate and relevant information.

The video also covers enhancing the agent’s output by structuring it with citations, which include file names, line numbers, and quoted text. This structured output format is useful for downstream applications, such as user interfaces that display answers alongside their sources. Dave emphasizes the importance of understanding the agentic loop’s inner workings and how the LLM uses tool documentation and parameters to guide its search and retrieval process. He also shares real-world experience, noting that these techniques are inspired by best practices from popular coding agents and AI development projects.

Finally, Dave discusses production considerations for deploying an agentic RAG system in real environments like VPS, container apps, or serverless functions. He introduces the use of the Rust-based ripgrep tool for faster and more efficient file searching, integrated via Python subprocesses. The production-ready code includes safety checks, error handling, logging, and limits to prevent runaway processes or excessive resource use. Dave encourages viewers to adapt the system to their own data sources and deployment environments, highlighting the flexibility of the approach. He concludes by inviting viewers interested in AI freelancing to explore his program for guidance on starting and succeeding in the field.