GitHub - lumina-ai-inc/chunkr: Vision infrastructure to turn complex documents into RAG/LLM-ready data

Chunkr | Open Source Document Intelligence API

Production-ready API service for document layout analysis, OCR, and semantic chunking.
Convert PDFs, PPTs, Word docs & images into RAG/LLM-ready chunks.

Layout Analysis | OCR + Bounding Boxes | Structured HTML and markdown | VLM Processing controls

(Super) Quick Start

Go to chunkr.ai
Make an account and copy your API key
Install our Python SDK:

pip install chunkr-ai

Use the SDK to process your documents:

from chunkr_ai import Chunkr

# Initialize with your API key from chunkr.ai
chunkr = Chunkr(api_key="your_api_key")

# Upload a document (URL or local file path)
url = "https://chunkr-web.s3.us-east-1.amazonaws.com/landing_page/input/science.pdf"
task = chunkr.upload(url)

# Export results in various formats
html = task.html(output_file="output.html")
markdown = task.markdown(output_file="output.md")
content = task.content(output_file="output.txt")
task.json(output_file="output.json")

# Clean up
chunkr.close()

Documentation

Visit our docs for more information and examples.

OpenSource vs Commercial API vs Enterprise

Feature	Open Source	Commercial API	Enterprise
Perfect for	Development & testing	Production applications	Large-scale/High security deployments
Layout Analysis	Basic models	Advanced models	Advanced + custom-tuned
OCR Accuracy	Standard models	Premium models	Premium + domain-tuned
VLM Processing	Basic vision models	Enhanced VLM models	Enhanced + custom fine-tunes
Excel Support	❌	✅ Native parser	✅ Native parser
Document Types	PDF, PPT, Word, Images	PDF, PPT, Word, Images, Excel	PDF, PPT, Word, Images, Excel
Infrastructure	Self-hosted	Fully managed	Fully managed (On-prem or Chunkr-hosted)
Support	Discord community	Priority email + community	24/7 dedicated founing team support
Migration Support	Community resources	Documentation + email	Dedicated migration team

Quick Start with Docker Compose

Prerequisites:
- Docker and Docker Compose
- NVIDIA Container Toolkit (for GPU support, optional)
Clone the repo:

git clone https://github.com/lumina-ai-inc/chunkr
cd chunkr

Set up environment variables:

# Copy the example environment file
cp .env.example .env

# Configure your llm models
cp models.example.yaml models.yaml

For more information on how to set up LLMs, see here.

Start the services:

# For GPU deployment:
docker compose up -d

# For CPU-only deployment:
docker compose -f compose.yaml -f compose.cpu.yaml up -d

# For Mac ARM architecture (M1, M2, M3, etc.):
docker compose -f compose.yaml -f compose.cpu.yaml -f compose.mac.yaml up -d

Access the services:
- Web UI: http://localhost:5173
- API: http://localhost:8000
Stop the services when done:

# For GPU deployment:
docker compose down

# For CPU-only deployment:
docker compose -f compose.yaml -f compose.cpu.yaml down

# For Mac ARM architecture (M1, M2, M3, etc.):
docker compose -f compose.yaml -f compose.cpu.yaml -f compose.mac.yaml down

LLM Configuration

Chunkr supports two ways to configure LLMs:

models.yaml file: Advanced configuration for multiple LLMs with additional options
Environment variables: Simple configuration for a single LLM

Using models.yaml (Recommended)

For more flexible configuration with multiple models, default/fallback options, and rate limits:

Copy the example file to create your configuration:

cp models.example.yaml models.yaml

Edit the models.yaml file with your configuration. Example:

models:
  - id: gpt-4o
    model: gpt-4o
    provider_url: https://api.openai.com/v1/chat/completions
    api_key: "your_openai_api_key_here"
    default: true
    rate-limit: 200 # requests per minute - optional

Benefits of using models.yaml:

Configure multiple LLM providers simultaneously
Set default and fallback models
Add distributed rate limits per model
Reference models by ID in API requests (see docs for more info)

Read the models.example.yaml file for more information on the available options.

Using environment variables (Basic)

You can use any OpenAI API compatible endpoint by setting the following variables in your .env file:

LLM__KEY:
LLM__MODEL:
LLM__URL:

Common LLM API Providers

Below is a table of common LLM providers and their configuration details to get you started:

Provider	API URL	Documentation
OpenAI	https://api.openai.com/v1/chat/completions	OpenAI Docs
Google AI Studio	https://generativelanguage.googleapis.com/v1beta/openai/chat/completions	Google AI Docs
OpenRouter	https://openrouter.ai/api/v1/chat/completions	OpenRouter Models
Self-Hosted	http://localhost:8000/v1	VLLM or Ollama

Licensing

The core of this project is dual-licensed:

GNU Affero General Public License v3.0 (AGPL-3.0)
Commercial License

To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.

Connect With Us

📧 Email: mehul@chunkr.ai
📅 Schedule a call: Book a 30-minute meeting
🌐 Visit our website: chunkr.ai

Name		Name	Last commit message	Last commit date
Latest commit History 5,330 Commits
.dev/otel-collector		.dev/otel-collector
.github		.github
.vscode		.vscode
apps/web		apps/web
core		core
docker		docker
images		images
nginx		nginx
packages		packages
services		services
.codespellrc		.codespellrc
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
.release-please-config.json		.release-please-config.json
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
COMMERCIAL_LICENSE.md		COMMERCIAL_LICENSE.md
CONTENT-MIGRATION-GUIDE.md		CONTENT-MIGRATION-GUIDE.md
LICENSE		LICENSE
README.md		README.md
THIRD-PARTY-NOTICES.md		THIRD-PARTY-NOTICES.md
compose.cpu.yaml		compose.cpu.yaml
compose.mac.yaml		compose.mac.yaml
compose.yaml		compose.yaml
models.example.yaml		models.example.yaml
realm-export.json		realm-export.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Chunkr | Open Source Document Intelligence API

Table of Contents

(Super) Quick Start

Documentation

OpenSource vs Commercial API vs Enterprise

Quick Start with Docker Compose

LLM Configuration

Using models.yaml (Recommended)

Using environment variables (Basic)

Common LLM API Providers

Licensing

Connect With Us

About

Uh oh!

Releases 118

Uh oh!

Contributors 18

Languages

License

lumina-ai-inc/chunkr

Folders and files

Latest commit

History

Repository files navigation

Chunkr | Open Source Document Intelligence API

Table of Contents

(Super) Quick Start

Documentation

OpenSource vs Commercial API vs Enterprise

Quick Start with Docker Compose

LLM Configuration

Using models.yaml (Recommended)

Using environment variables (Basic)

Common LLM API Providers

Licensing

Connect With Us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 118

Uh oh!

Contributors 18

Languages