Description
Traditional PDF extraction tools often struggle with complex layouts, tables, and images, Docling (an opensource Python library developed at IBM) excels at extracting structured information from these elements, enabling the creation of richer, more accurate vector databases. This hands-on tutorial will guide participants through building a Retrieval Augmented Generation (RAG) system using Docling, an open-source document processing library.
Participants will learn how to harness Docling's advanced capabilities to build superior RAG systems that can understand and retrieve information from complex document elements that traditional tools might miss. Participants will learn how to handle complex documents, extract structured information, and create an efficient vector database for semantic search. The session will cover best practices for document parsing, chunking strategies, and integration with popular LLM frameworks.