An advanced, production-style Retrieval-Augmented Generation (RAG) system built using LangChain, LangGraph, FastAPI, and Streamlit, featuring hybrid retrieval, persistent conversational memory, and a self-correcting hallucination-reduction pipeline with web search fallback.
This project implements a Corrective RAG (CRAG) architecture designed to deliver grounded, reliable, and context-aware responses over private document collections — while gracefully falling back to real-time web search when internal knowledge is insufficient.
- 🔍 Hybrid Retrieval — BM25 + Vector Search ensemble retriever
- 🧠 LangGraph Agentic Workflow — multi-stage reasoning pipeline
- ♻️ Corrective RAG — hallucination detection with retries
- ✍️ Query Rewriting for improved retrieval recall
- 🌐 Web Search Fallback using Tavily API
- 💾 Persistent Conversation Memory using LangGraph + SQLite
- ⚙️ FastAPI Lifespan Initialization for efficient resource loading
- 💬 Streamlit Chat Interface
Large Language Models are powerful — but prone to hallucinations when knowledge is missing or retrieval fails.
Corrective RAG (CRAG) solves this by introducing evaluation, correction, and fallback loops inside the retrieval-generation pipeline.
This project demonstrates a fully agentic CRAG pipeline, where the system:
- Retrieves relevant documents using hybrid retrieval
- Judges retrieval relevance
- Generates an answer
- Evaluates groundedness and usefulness
- Retries generation or retrieval if needed
- Rewrites queries to improve recall
- Falls back to live web search when internal knowledge fails
The result is a robust, hallucination-resistant knowledge assistant.
| Layer | Technology |
|---|---|
| LLMs | Google Gemini, LLaMA (HF Endpoint) |
| Orchestration | LangChain + LangGraph |
| Retrieval | ChromaDB Vector Store + BM25 |
| Retriever Ensemble | Weighted Hybrid Retriever |
| Embeddings | Gemini Embedding Model |
| Web Search | Tavily API |
| Backend | FastAPI with Lifespan Events |
| Persistence | LangGraph SQLite Checkpointer |
| Frontend | Streamlit Chat UI |
| Document Loading | Unstructured PDF Loader |
This project implements a Corrective Retrieval-Augmented Generation (CRAG) architecture using LangGraph.
Unlike standard RAG pipelines that perform only retrieval → generation, CRAG introduces self-evaluation and correction loops to minimize hallucinations and improve answer reliability.
Below is the high-level workflow:
The user query first passes through a hybrid retriever that combines:
- BM25 lexical retrieval for keyword matching
- Chroma vector retrieval (MMR search) for semantic similarity
An Ensemble Retriever merges both results, improving recall and precision.
A lightweight LLM evaluates whether the retrieved context is relevant to the query.
- If relevant → proceed to answer generation
- If not relevant → trigger web search fallback
This prevents the model from generating answers using weak or unrelated context.
When internal document retrieval fails, the system queries the Tavily Web Search API to fetch external knowledge.
The retrieved web content is inserted as context for answer generation.
This ensures graceful degradation instead of hallucination.
The answer is generated using an LLM conditioned on:
- The retrieved context
- The current user query
- Recent chat history
The result is appended to persistent conversation memory.
A second LLM evaluates the generated answer on two criteria:
- Groundedness → Is the answer supported by the provided context?
- Usefulness → Does the answer actually address the user’s query?
Based on evaluation:
- If hallucinated (not grounded) → retry answer generation
- If not useful → rewrite the query → retrieve again
- If still failing after retries → fallback response
These loops form the corrective core of CRAG.
The entire LangGraph state is stored using a SQLite checkpointer, enabling:
- Multi-turn conversational continuity
- Thread-based session tracking
- Memory persistence across backend restarts
✔ Reduces LLM hallucinations
✔ Improves retrieval robustness
✔ Handles missing knowledge gracefully
✔ Enables persistent multi-turn chat
✔ Mirrors production-grade agentic RAG systems
This makes the system more reliable than standard RAG pipelines while remaining modular and extensible.
- Enterprise knowledge-base assistants
- Research paper and academic assistants
- Legal and regulatory document assistants
- Technical and developer documentation copilots
- Hybrid private + web knowledge assistants
