Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling
-
Updated
Jun 29, 2026 - Python
Opinionated agentic RAG powered by LanceDB, Pydantic AI, and Docling
Hybrid RAG system combining vector search, knowledge graph (LightRAG), and cross-encoder reranking — with Docling document parsing, visual intelligence (image/table captioning), agentic streaming chat, and inline citations. Powered by Gemini or local Ollama models.
A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.
Collection of PDF parsing libraries like AI based docling, claude, openai, gemini, meta's llama-vision, unstructured-io, and pdfminer, pymupdf, pdfplumber etc for efficient snapshot, text, table, and metadata extraction.
Transform unstructured documents into validated, rich and queryable knowledge graphs.
Python, LlamaIndex, LangChain, Docker Compose: 15 Property Graph, 4 RDF , 10 Vector, OpenSearch, Elasticsearch, Alfresco DBs. 13 data sources (9 auto-sync), KG auto-building, Ontologies, LLMs, Docling or LlamaParse doc processing, GraphRAG, RAG only, Hybrid Search, AI Chat. TypeScript React, Vue, Angular frontends, FastAPI REST backend, MCP Server.
PDFStract - Extract, Chunking and Embedding Layer in Your RAG Pipeline - Available as CLI - WEBUI - API
Open-source toolkit for reliable RAG pipelines: convert PDFs to Markdown, clean documents, inspect chunks, compare chunking strategies, and enrich metadata for LLM applications.
Docling with Ollama - RAG on Local Files with Local Models
PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
A python library and CLI tool to convert PDF files to CSV files.
DocChat is an AI-powered Multi-Agent RAG system using Docling for structured document parsing and BM25 + vector search retrievers to retrieve fact-checked answers from PDFs, DOCX, and text files, preventing hallucinations. 🚀
Autonomous agent networks for task automation that requires multi-step reasoning
Privacy-first document intelligence engine — parse PDFs, DOCX, PPTX, XLSX & CSV into AI-ready chunks for RAG pipelines. Includes HITL review, 3-layer memory chat, and a production FastAPI server.
Enterprise-grade document parsing service with asynchronous queue processing based on MinerU, Celery and Docker.
把 PDF/Word/TXT/Markdown 教材浓缩成交互式 HTML 复习文档的 Claude Code / Codex CLI Skill。自动识别文科/理工科模式,5-pass 深度提取,原生支持扫描版 PDF OCR。v1.2 用 pdfium 后端修复 docling std::bad_alloc 崩溃,大 PDF 稳定提取。
OnnxTR OCR plugin for Docling
Advanced PDF/Document Translator with interactive comparison. Built on IBM Docling.
Add a description, image, and links to the docling topic page so that developers can more easily learn about it.
To associate your repository with the docling topic, visit your repo's landing page and select "manage topics."