The video introduces Docling, an open-source framework that streamlines the conversion of unstructured data (like PDFs, images, and spreadsheets) into structured formats for use in retrieval-augmented generation (RAG) and AI agent workflows. Docling’s key features include broad file support, advanced information extraction, and seamless integration with popular AI tools, making it easier for organizations to prepare and utilize their data effectively in AI applications.
The video discusses a major challenge in retrieval-augmented generation (RAG) pipelines and AI agents: effective data preparation. For AI models to provide accurate and useful responses, they must fully understand the data they are given, which often comes in various formats such as PDFs, tables, images, audio, and more. Most organizations deal with a wide range of unstructured data types, which need to be converted into structured formats like Markdown, plain text, or JSON for use in RAG or agentic workflows. Traditional methods like scripting and OCR can be tedious and limited, which is where Docling, an open-source framework, comes in to streamline this process.
Docling is designed to process all kinds of files and convert them into clean, structured text that large language models can utilize. It supports a variety of file types, including PDFs, Word documents, PowerPoint presentations, scanned images, and spreadsheets. The framework is purpose-built to handle the conversion of unstructured data into structured formats, making it easier to integrate into RAG and AI agent workflows. The real challenge in building effective AI agents is not just the agent itself, but curating and structuring the underlying knowledge and context, which Docling addresses directly.
A key feature of Docling is its Model Context Protocol (MCP) server, an open standard that allows AI applications to integrate seamlessly with external tools and data sources. The MCP server can connect to popular desktop clients like Claude Desktop, LM Studio, or Cursor, enabling users to transform documents into structured data formats on their local machines. This standardization means that any large language model or agent supporting tool calling can leverage Docling’s capabilities to convert documents using natural language commands, making the process more accessible and efficient.
Docling’s output is particularly valuable for RAG workflows because it produces rich, hierarchical documents with element types, headings, and metadata. This structure allows for intelligent chunking by sections, tables, and captions, and automatically carries parent context like titles and headers. As a result, retrieval is more cohesive and accurate compared to fixed-size splits. Docling also supports multimodal RAG, preserving images and tables, and can enrich figures with text descriptions for better retrieval. Each element includes provenance, page, and bounding box information, making it easy to trace and visualize the source of retrieved content.
Beyond conversion, Docling offers advanced information extraction features. Users can define templates or schemas to extract specific fields, such as invoice numbers or costs, from documents. This results in clean, validated, and structured data that matches predefined models, ready for use in applications or APIs. Docling integrates with major RAG frameworks like LangChain, LlamaIndex, Haystack, and LangFlow, fitting seamlessly into data pipelines for automation and real-time processing. As open-source software under the MIT license and part of the Linux Foundation’s Data and AI Foundation, Docling is well-suited for secure, regulated environments such as healthcare and finance, making it a robust solution for organizations seeking to unlock the full potential of their enterprise data with AI.