Easy-to-use and powerful LLM and SLM library with awesome model zoo.
-
Updated
Dec 5, 2025 - Python
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
ContextGem: Effortless LLM extraction from documents
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
BoundaryNet - A Semi-Automatic Layout Annotation Tool
AI-powered document intelligence platform for automated analysis, processing, and insights extraction from various document formats.
An explainable AI system that combines Graph Intelligence, Vector Search, and Retrieval-Augmented Generation (RAG) to deliver grounded answers and transparent reasoning paths. Includes a FastAPI backend, Streamlit UI, FAISS vector index, and an in-memory knowledge graph for hybrid retrieval and recommendations.
An experiment to provide the capabilities of Azure AI Document Intelligence Studio template training for feedback loop
A next-gen AI document extraction system capable of parsing text, tables, and layouts from native PDFs, scanned images, and various document formats with high precision. Built with Docling & Streamlit.
A collection of solutions that leverage Azure AI services.
Enterprise-grade RAG system featuring dual online/offline operation, multi-modal document processing, and advanced AI capabilities including knowledge graph construction and hybrid search for intelligent document analysis.
Advanced multimodal RAG system for querying PDF documents with text, images, and tables using vector embeddings, semantic chunking, and LLMs via Groq API
App used to extract structured data from documents photos or pdfs via custom templating and commercial LLM (GPT and Azure Document Intelligence). Developed as a Computer Science Thesis at University of Bologna
Extract and summarise data from PDFs and images using OCR + LLMs. Built with Python, OpenCV, HuggingFace, and Flask.
Multimodal RAG with Adobe PDF Extract, CLIP embeddings & MMR diversity. Interactive dashboard with evaluation metrics for document intelligence.
Enterprise AI assistant for intelligent document Q&A via Slack - Advanced RAG system with multi-language support.
Comprehensive learning hub for Azure AI services - 130+ labs and tutorials covering AI-102 certification
A comprehensive, production-ready Python pipeline for converting various document formats into clean, validated, and optimally chunked Markdown files ready for Large Language Model (LLM) consumption and NotebookLM notebooks.
Automated Document Processing and Markdown Generation System
Persona-Driven Document Intelligence – Offline system that extracts, ranks, and refines the most relevant PDF sections based on a persona and their job-to-be-done. Built for Adobe Hackathon 2025 (Round 1B) with heading-aware parsing, semantic embeddings, and batch inference.
**Product Vision:** To build India's first "Sovereign-by-Design" Document Intelligence Platform that democratizes AI for every government department, ensuring data never leaves the premise.
Add a description, image, and links to the document-intelligence topic page so that developers can more easily learn about it.
To associate your repository with the document-intelligence topic, visit your repo's landing page and select "manage topics."