CRAG: Corrective Hybrid Retrieval-Augmented Generation Chatbot

An advanced, production-style Retrieval-Augmented Generation (RAG) system built using LangChain, LangGraph, FastAPI, and Streamlit, featuring hybrid retrieval, persistent conversational memory, and a self-correcting hallucination-reduction pipeline with web search fallback.

This project implements a Corrective RAG (CRAG) architecture designed to deliver grounded, reliable, and context-aware responses over private document collections — while gracefully falling back to real-time web search when internal knowledge is insufficient.

✨ Highlights

🔍 Hybrid Retrieval — BM25 + Vector Search ensemble retriever
🧠 LangGraph Agentic Workflow — multi-stage reasoning pipeline
♻️ Corrective RAG — hallucination detection with retries
✍️ Query Rewriting for improved retrieval recall
🌐 Web Search Fallback using Tavily API
💾 Persistent Conversation Memory using LangGraph + SQLite
⚙️ FastAPI Lifespan Initialization for efficient resource loading
💬 Streamlit Chat Interface

Introduction

Large Language Models are powerful — but prone to hallucinations when knowledge is missing or retrieval fails.
Corrective RAG (CRAG) solves this by introducing evaluation, correction, and fallback loops inside the retrieval-generation pipeline.

This project demonstrates a fully agentic CRAG pipeline, where the system:

Retrieves relevant documents using hybrid retrieval
Judges retrieval relevance
Generates an answer
Evaluates groundedness and usefulness
Retries generation or retrieval if needed
Rewrites queries to improve recall
Falls back to live web search when internal knowledge fails

The result is a robust, hallucination-resistant knowledge assistant.

🏗️ Tech Stack

Layer	Technology
LLMs	Google Gemini, LLaMA (HF Endpoint)
Orchestration	LangChain + LangGraph
Retrieval	ChromaDB Vector Store + BM25
Retriever Ensemble	Weighted Hybrid Retriever
Embeddings	Gemini Embedding Model
Web Search	Tavily API
Backend	FastAPI with Lifespan Events
Persistence	LangGraph SQLite Checkpointer
Frontend	Streamlit Chat UI
Document Loading	Unstructured PDF Loader

🧠 CRAG Architecture

This project implements a Corrective Retrieval-Augmented Generation (CRAG) architecture using LangGraph.
Unlike standard RAG pipelines that perform only retrieval → generation, CRAG introduces self-evaluation and correction loops to minimize hallucinations and improve answer reliability.

Below is the high-level workflow:

🔄 Workflow Explanation

1️⃣ Retrieve (Hybrid Retrieval)

The user query first passes through a hybrid retriever that combines:

BM25 lexical retrieval for keyword matching
Chroma vector retrieval (MMR search) for semantic similarity

An Ensemble Retriever merges both results, improving recall and precision.

2️⃣ Relevance Grader

A lightweight LLM evaluates whether the retrieved context is relevant to the query.

If relevant → proceed to answer generation
If not relevant → trigger web search fallback

This prevents the model from generating answers using weak or unrelated context.

3️⃣ Web Search Fallback

When internal document retrieval fails, the system queries the Tavily Web Search API to fetch external knowledge.
The retrieved web content is inserted as context for answer generation.

This ensures graceful degradation instead of hallucination.

4️⃣ Answer Generation

The answer is generated using an LLM conditioned on:

The retrieved context
The current user query
Recent chat history

The result is appended to persistent conversation memory.

5️⃣ Answer Evaluation (Groundedness & Usefulness)

A second LLM evaluates the generated answer on two criteria:

Groundedness → Is the answer supported by the provided context?
Usefulness → Does the answer actually address the user’s query?

6️⃣ Corrective Loops

Based on evaluation:

If hallucinated (not grounded) → retry answer generation
If not useful → rewrite the query → retrieve again
If still failing after retries → fallback response

These loops form the corrective core of CRAG.

7️⃣ Persistent Memory

The entire LangGraph state is stored using a SQLite checkpointer, enabling:

Multi-turn conversational continuity
Thread-based session tracking
Memory persistence across backend restarts

Why This Architecture Matters

✔ Reduces LLM hallucinations
✔ Improves retrieval robustness
✔ Handles missing knowledge gracefully
✔ Enables persistent multi-turn chat
✔ Mirrors production-grade agentic RAG systems

This makes the system more reliable than standard RAG pipelines while remaining modular and extensible.

Key Applications

Enterprise knowledge-base assistants
Research paper and academic assistants
Legal and regulatory document assistants
Technical and developer documentation copilots
Hybrid private + web knowledge assistants

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
files		files
model		model
schemas		schemas
README.md		README.md
architecture.png		architecture.png
chatbot.py		chatbot.py
frontend.py		frontend.py
ingest.py		ingest.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CRAG: Corrective Hybrid Retrieval-Augmented Generation Chatbot

✨ Highlights

Introduction

🏗️ Tech Stack

🧠 CRAG Architecture

🔄 Workflow Explanation

1️⃣ Retrieve (Hybrid Retrieval)

2️⃣ Relevance Grader

3️⃣ Web Search Fallback

4️⃣ Answer Generation

5️⃣ Answer Evaluation (Groundedness & Usefulness)

6️⃣ Corrective Loops

7️⃣ Persistent Memory

Why This Architecture Matters

Key Applications

About

Uh oh!

Releases

Packages

Languages

Pavan-220405/HalluciGuard-CRAG

Folders and files

Latest commit

History

Repository files navigation

CRAG: Corrective Hybrid Retrieval-Augmented Generation Chatbot

✨ Highlights

Introduction

🏗️ Tech Stack

🧠 CRAG Architecture

🔄 Workflow Explanation

1️⃣ Retrieve (Hybrid Retrieval)

2️⃣ Relevance Grader

3️⃣ Web Search Fallback

4️⃣ Answer Generation

5️⃣ Answer Evaluation (Groundedness & Usefulness)

6️⃣ Corrective Loops

7️⃣ Persistent Memory

Why This Architecture Matters

Key Applications

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages