DocMind AI - Hybrid RAG Intelligence System

Enterprise-Grade Document Intelligence Platform

DocMind AI is a privacy-first Retrieval-Augmented Generation (RAG) system designed to bridge the gap between secure local intelligence and high-performance cloud LLMs. It allows users to chat with massive PDF datasets using a Hybrid Inference Bridge that preserves data sovereignty while providing citation-backed accuracy.

🚀 Quick Start

Launch the platform in 2 steps:

# 1. Start Backend (API + Vector DB)
docker-compose up -d --build

# 2. Start Frontend Dashboard
cd frontend && npm install && npm run dev

Detailed Setup: See GETTING_STARTED.md.

📸 Demo & Architecture

Smart Document Interface

High-fidelity chat UI with real-time neural indexing telemetry.

System Architecture

Hybrid Inference Gateway routing between OpenAI (Cloud) and Ollama (Local).

Neural Inspector

Deep observability into the vector store and semantic document chunks.

Deep Dive: See ARCHITECTURE.md for Chunking Logic and Decision Logs.

✨ Key Features

🧠 Hybrid Brain: Switch between GPT-4o and Llama 3 instantly.
📚 RAG Pipeline: Professional recursive splitting (1000/200 overlap).
🔍 High-Precision Search: Hybrid semantic + metadata filtering.
🔒 Air-Gapped Ready: Fully local vector storage using ChromaDB.

🏗️ The Intelligence Journey

Understanding how a PDF becomes a conversational agent:

Ingest: Document parsed and cleaned via pypdf.
Chunk: Segmented into 1000-char overlapping blocks.
Embed: Converted to high-dimensional vectors.
Index: Stored in ChromaDB with page-level metadata.
Query: System retrieves top chunks to ground LLM responses.

📚 Documentation

Document	Description
System Architecture	Vectors, Chunking, and Provider Abstraction.
Getting Started	Enviroment setup (Cloud vs Local mode).
Failure Scenarios	Hallucination mitigation and grounding logic.
Interview Q&A	RAG strategy and technical justifications.

🔧 Tech Stack

Component	Technology	Role
Brain	FastAPI (Python)	LangChain Orchestrator.
Memory	ChromaDB	Local Vector Store.
Intelligence	OpenAI / Ollama	LLM Inference Backends.
Interface	Next.js 14	Enterprise Dashboard.

👤 Author

Harshan Aiyappa
Senior Full-Stack Hybrid Engineer
GitHub Profile

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
chromadb-admin		chromadb-admin
docs		docs
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocMind AI - Hybrid RAG Intelligence System

Enterprise-Grade Document Intelligence Platform

🚀 Quick Start