🖼️📄E2E Multi-modal Document Preprocessing with Azure Document Intelligence
-
Updated
Oct 22, 2025 - Python
🖼️📄E2E Multi-modal Document Preprocessing with Azure Document Intelligence
Workshop for Azure OpenAI Service
This is a collection of various document parsers and hands-on to construct structured data for your RAG applications.
An application that automatically parses bank statements to visualize current income and spending compared to budgeting and savings targets
AI-Powered Web Application for Talent Search and CV Management
Azure Document Intelligence Result Processor: A toolset for annotating PDFs based on Azure Document Intelligence analysis results, featuring a React web application and a standalone Python script for processing and visualizing extracted data with confidence indicators.
OCR-enabled PDF text extraction in Python with pypdf and Azure Document Intelligence.
Frontend and Backend Web App for Receipt Splitting with Friends
🚀 Intelligent document extraction system powered by Azure AI & Gemini 2.5. Transform any form into structured JSON with real-time editing and enterprise-grade validation.
PDF extraction samples comparing Azure Document Intelligence (layout model) 🏢 vs Markitdown ✍️vs Apache Tika
Scribbly - Convert your boring notes into interactive flashcards using Azure Text Analytics, Azure Document Intelligence and Gemini AI
An Enterprise RAG pipeline using Azure AI Document Intelligence, Translator, and OpenAI GPT-4o to query complex, multi-lingual PDFs with strict source citations and conversational memory.
A dual-agent, feedback-driven document extraction system using GPT-5 and Azure Document Intelligence that automatically improves its own prompts, reducing manual rule updates and adapting to evolving business requirements.
Serverless invoice extraction API using Azure Document Intelligence and Azure Functions. Upload a PDF invoice and receive normalized JSON output including line items, totals, dates, and vendor details.
Enterprise AI system to classify, split, and auto-route PDFs using Azure Document Intelligence and SharePoint.
A Streamlit-based app with a FastAPI backend for extracting structured data (text, images, tables) from websites and PDFs. Processed data is stored in AWS S3 and rendered in a markdown-standardized format. APIs are deployed on Google Cloud Run Service
self rag sample
Image to text extraction to AI to anki. Automation for creating anki cards from an image.
Uses OCR and PII detection models to mask PII in .tiff files. Configurable to use Azure and AWS OCR and PII detection models
Add a description, image, and links to the azure-document-intelligence topic page so that developers can more easily learn about it.
To associate your repository with the azure-document-intelligence topic, visit your repo's landing page and select "manage topics."