Easy-to-use and powerful LLM and SLM library with awesome model zoo.
-
Updated
Dec 17, 2025 - Python
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 76+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
ContextGem: Effortless LLM extraction from documents
A curated list of resources for Document Understanding (DU) topic
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
AI-in-a-Box leverages the expertise of Microsoft across the globe to develop and provide AI and ML solutions to the technical community. Our intent is to present a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction.
Local-first AI-powered document intelligence platform for investigative journalism
A collection of samples demonstrating techniques for processing documents with Azure AI including AI Foundry, OpenAI, Document Intelligence, etc.
ReadingBank: A Benchmark Dataset for Reading Order Detection
The Doc Intelligence in-a-Box project leverages Azure AI Document Intelligence to extract data from PDF forms and store the data in a Azure Cosmos DB. This solution, part of the AI-in-a-Box framework by Microsoft Customer Engineers and Architects, ensures quality, efficiency, and rapid deployment of AI and ML solutions across various industries.
A curated list of resources on Table Structure Recognition
Course Website
This sample demonstrates how to use Document Intelligence's Layout model to convert a PDF document, such as invoices, into Markdown, then use GPT-3.5 Turbo to extract structured JSON data using the Azure OpenAI Service.
Knwler is a lightweight, single-file Python tool that extracts structured knowledge graphs from documents using AI. Feed it a PDF or text file and receive a richly connected network of entities, relationships, and topics — complete with an interactive HTML report and exports ready for your favorite graph analytics platform.
An explainable AI system that combines Graph Intelligence, Vector Search, and Retrieval-Augmented Generation (RAG) to deliver grounded answers and transparent reasoning paths. Includes a FastAPI backend, Streamlit UI, FAISS vector index, and an in-memory knowledge graph for hybrid retrieval and recommendations.
BoundaryNet - A Semi-Automatic Layout Annotation Tool
AI-powered document intelligence platform for automated analysis, processing, and insights extraction from various document formats.
A curated list of resources on Document Layout Analysis
An experiment to provide the capabilities of Azure AI Document Intelligence Studio template training for feedback loop
Add a description, image, and links to the document-intelligence topic page so that developers can more easily learn about it.
To associate your repository with the document-intelligence topic, visit your repo's landing page and select "manage topics."