Skip to content

Intelligent document processing. Extract structured data like JSON, Markdown and HTML from documents using AI.

License

Notifications You must be signed in to change notification settings

docuglean-ai/docuglean-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intelligent document processing using State of the Art AI models.

If you find Docuglean helpful, please ⭐ this repository to show your support!

What is Docuglean?

Docuglean is a unified SDK for intelligent document processing using State of the Art AI models. Docuglean provides multilingual and multimodal capabilities with plug-and-play APIs for document OCR, structured data extraction, annotation, classification, summarization, and translation. It also comes with inbuilt tools and supports different types of documents out of the box.

Features

  • 🚀 Easy to Use: Simple, intuitive API with detailed documentation. Just pass in a file and get markdown in response.
  • 🔍 OCR Capabilities: Extract text from images and scanned documents
  • 📊 Structured Data Extraction: Use Zod schemas for type-safe data extraction
  • 📄 Multimodal Support: Process PDFs and images with ease
  • 🤖 Multiple AI Providers: Support for OpenAI, Mistral, and Google Gemini, with more coming soon
  • 🔒 Type Safety: Full TypeScript support with comprehensive types

Available SDKs

📦 Node.js/TypeScript SDK

Package: docuglean-ocr

npm install docuglean-ocr

Repository: node-ocr/

Quick Start:

import { ocr, extract } from 'docuglean-ocr';

const result = await ocr({
  filePath: './document.pdf',
  provider: 'mistral',
  model: 'mistral-ocr-latest',
  apiKey: 'your-api-key'
});

🐍 Python SDK

Package: docuglean-ocr

pip install docuglean-ocr

Repository: python-ocr/

Quick Start:

from docuglean import ocr, extract

result = await ocr(
    file_path="./document.pdf",
    provider="mistral",
    model="mistral-ocr-latest",
    api_key="your-api-key"
)

Coming Soon

  • 📝 summarize(): TLDRs of long documents
  • 🌐 translate(): Support for multilingual documents
  • 🏷️ classify(): Document type classifier (receipt, ID, invoice, etc.)
  • 🔍 search(query): LLM-powered search across documents
  • 🤖 More Models. More Providers: Integration with Meta's Llama, Together AI, OpenRouter and lots more.
  • 🌍 Multilingual: Support for multiple languages
  • 🎯 Smart Classification: Automatic document type detection

Provider Options

Currently supported providers and models:

  • OpenAI: gpt-4o-mini, gpt-4o, gpt-4-turbo, gpt-3.5-turbo, o1-mini, o1-preview
  • Mistral: mistral-ocr-latest, mistral-small-latest, ministral-8b-latest
  • Google Gemini: gemini-2.5-flash, gemini-2.5-pro, gemini-1.5-flash, gemini-1.5-pro
  • Hugging Face: Qwen/Qwen2.5-VL-3B-Instruct and other vision-language models (Python only)

Development

Node.js SDK

cd node-ocr
npm install
npm run build
npm test

Python SDK

cd python-ocr
uv sync
uv run pytest

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

Apache 2.0 - see the LICENSE file for details.

Stay Up to Date

⭐ Star this repo to get notified about new releases and updates!

About

Intelligent document processing. Extract structured data like JSON, Markdown and HTML from documents using AI.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published