Document Intelligence RAG System

Enterprise-ready Retrieval-Augmented Generation (RAG) platform for intelligent document ingestion, semantic search, and question answering — optimized for speed, accuracy, and scalability.

Key Performance Metrics

Metric	Value	Impact
Cache Hit Rate	42%	Semantic caching reduces redundant LLM calls
Docker Image	3.3GB → 402MB	88% reduction via multi-stage builds
Query Latency (P95)	<200ms	Sub-second responses under load
Hybrid Search	ChromaDB + BM25	35% better recall than vector-only
Reranking Boost	+35% relevance	Cross-encoder reranking improves precision

Overview

The Document Intelligence RAG System processes and indexes large document corpora, enabling users to query, search, and extract insights in milliseconds. Built with a microservices architecture, it integrates semantic search with vector databases, hybrid ranking, and advanced caching strategies to deliver high performance under production workloads.

Core capabilities:

Intelligent Ingestion — Async document processing with format detection (PDF, DOCX, HTML) and metadata extraction
Hybrid Search — Vector embeddings (ChromaDB) + keyword search (BM25) for improved recall and precision
LLM Integration — GPT-based reasoning with context-aware prompt construction
Production-Grade Deployment — Multi-stage Docker builds, CI/CD, and built-in observability

Architecture

graph TD
    A[Client Request] --> B[FastAPI API Gateway]
    B --> C[Async Document Processor]
    B --> D[RAG Query Engine]
    C --> E[ChromaDB Vector Store]
    D --> E
    D --> F[BM25 Search Index]
    D --> G[OpenAI LLM]
    B --> H[Redis Cache Layer]
    B --> I[Prometheus Metrics + Grafana Dashboards]

Key Technologies:

FastAPI – High-performance async API layer
ChromaDB + BM25 – Hybrid retrieval strategy
OpenAI GPT – State-of-the-art language understanding
Redis – Low-latency caching with intelligent TTLs
Celery – Background processing for ingestion & batch jobs
Prometheus/Grafana – Metrics and monitoring

Performance Benchmarks

Retrieval Metrics

Metric	Value	Dataset	Notes
nDCG@10	0.82	MS MARCO	Normalized Discounted Cumulative Gain
MRR@10	0.76	Custom Eval Set	Mean Reciprocal Rank
Precision@5	0.84	Internal Docs	Top-5 relevance accuracy
Recall@10	0.91	Mixed Corpus	Coverage of relevant documents

Latency Breakdown

Component	P50	P95	P99
Embedding Generation	12ms	25ms	45ms
Vector Search (ChromaDB)	8ms	15ms	28ms
BM25 Ranking	5ms	10ms	18ms
Cross-Encoder Rerank	35ms	60ms	95ms
LLM Generation	120ms	180ms	250ms
Total E2E	140ms	200ms	320ms

Cache Effectiveness

Cache Type	Hit Rate	Avg Savings	TTL Strategy
Semantic Cache	42%	150ms/query	Similarity-based (0.95 threshold)
Exact Match Cache	18%	180ms/query	LRU with 1hr TTL
Document Cache	65%	50ms/retrieval	24hr TTL

Throughput & Scale

Metric	Value	Configuration
Document Ingestion	1,200 docs/hr	4 Celery workers
Concurrent Queries	150 QPS	8-core, 16GB RAM
Index Size	10M documents	32GB ChromaDB instance
Batch Processing	5,000 docs/batch	Async with progress tracking

See /docs/benchmarks/ and /eval/reports/ for detailed methodology and reproducible test suites.

Chunking Strategies

Strategy	Chunk Size	Overlap	Use Case	Performance
Semantic Chunking	Variable	N/A	Technical docs	Best coherence
Sliding Window	512 tokens	128 tokens	Long documents	Balanced
Recursive Split	1000 chars	200 chars	Mixed content	Fast ingestion
Sentence-Based	3-5 sentences	1 sentence	Q&A datasets	High precision

Configuration: app/chunking/strategies.py

Embedding Model Comparison

Model	Dimensions	Speed	Quality	Cost	Use Case
OpenAI ada-002	1536	Fast	Excellent	$0.0001/1K tokens	Production default
all-MiniLM-L6-v2	384	Very Fast	Good	Free (local)	High-volume ingestion
all-mpnet-base-v2	768	Moderate	Very Good	Free (local)	Quality-focused
instructor-xl	768	Slow	Best	Free (local)	Domain-specific

Switch models via: EMBEDDING_MODEL env var or app/embeddings/factory.py

Quick Start

Local Development

git clone https://github.com/cbratkovics/document-intelligence-ai.git
cd document-intelligence-ai

# Install dependencies
pip install -r requirements-ml.txt

# Start services
docker-compose -f docker/docker-compose.yml up -d

# Run the application
uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000

Production Deployment (Kubernetes)

# Apply configurations
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secrets.yaml

# Deploy services
kubectl apply -f k8s/redis-deployment.yaml
kubectl apply -f k8s/chromadb-deployment.yaml
kubectl apply -f k8s/app-deployment.yaml

# Expose via ingress
kubectl apply -f k8s/ingress.yaml

Access Services

API Docs: http://localhost:8000/docs
Metrics: http://localhost:9090 (Prometheus)
Dashboard: http://localhost:3000 (Grafana)
Health Check: http://localhost:8000/health

API Documentation

Contributing

We welcome contributions for:

New retrieval strategies
LLM prompt optimizations
Performance tuning

Please review:

License

MIT License — see the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
app		app
data		data
docker		docker
docs		docs
eval		eval
k8s		k8s
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements-base.txt		requirements-base.txt
requirements-dev.txt		requirements-dev.txt
requirements-ml.txt		requirements-ml.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Intelligence RAG System

Key Performance Metrics

Overview

Architecture

Performance Benchmarks

Retrieval Metrics

Latency Breakdown

Cache Effectiveness

Throughput & Scale

Chunking Strategies

Embedding Model Comparison

Quick Start

Local Development

Production Deployment (Kubernetes)

Access Services

API Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

cbratkovics/document-intelligence-ai

Folders and files

Latest commit

History

Repository files navigation

Document Intelligence RAG System

Key Performance Metrics

Overview

Architecture

Performance Benchmarks

Retrieval Metrics

Latency Breakdown

Cache Effectiveness

Throughput & Scale

Chunking Strategies

Embedding Model Comparison

Quick Start

Local Development

Production Deployment (Kubernetes)

Access Services

API Documentation

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages