Skip to content

HalluciGuard-CRAG — A hybrid-retrieval, self-correcting RAG chatbot built with LangChain, LangGraph, FastAPI, and Streamlit. Combines BM25 + vector search, hallucination detection, query rewriting, and web fallback for grounded, reliable AI answers.

Notifications You must be signed in to change notification settings

Pavan-220405/HalluciGuard-CRAG

Repository files navigation

CRAG: Corrective Hybrid Retrieval-Augmented Generation Chatbot

An advanced, production-style Retrieval-Augmented Generation (RAG) system built using LangChain, LangGraph, FastAPI, and Streamlit, featuring hybrid retrieval, persistent conversational memory, and a self-correcting hallucination-reduction pipeline with web search fallback.

This project implements a Corrective RAG (CRAG) architecture designed to deliver grounded, reliable, and context-aware responses over private document collections — while gracefully falling back to real-time web search when internal knowledge is insufficient.


✨ Highlights

  • 🔍 Hybrid Retrieval — BM25 + Vector Search ensemble retriever
  • 🧠 LangGraph Agentic Workflow — multi-stage reasoning pipeline
  • ♻️ Corrective RAG — hallucination detection with retries
  • ✍️ Query Rewriting for improved retrieval recall
  • 🌐 Web Search Fallback using Tavily API
  • 💾 Persistent Conversation Memory using LangGraph + SQLite
  • ⚙️ FastAPI Lifespan Initialization for efficient resource loading
  • 💬 Streamlit Chat Interface

Introduction

Large Language Models are powerful — but prone to hallucinations when knowledge is missing or retrieval fails.
Corrective RAG (CRAG) solves this by introducing evaluation, correction, and fallback loops inside the retrieval-generation pipeline.

This project demonstrates a fully agentic CRAG pipeline, where the system:

  1. Retrieves relevant documents using hybrid retrieval
  2. Judges retrieval relevance
  3. Generates an answer
  4. Evaluates groundedness and usefulness
  5. Retries generation or retrieval if needed
  6. Rewrites queries to improve recall
  7. Falls back to live web search when internal knowledge fails

The result is a robust, hallucination-resistant knowledge assistant.


🏗️ Tech Stack

Layer Technology
LLMs Google Gemini, LLaMA (HF Endpoint)
Orchestration LangChain + LangGraph
Retrieval ChromaDB Vector Store + BM25
Retriever Ensemble Weighted Hybrid Retriever
Embeddings Gemini Embedding Model
Web Search Tavily API
Backend FastAPI with Lifespan Events
Persistence LangGraph SQLite Checkpointer
Frontend Streamlit Chat UI
Document Loading Unstructured PDF Loader

🧠 CRAG Architecture

This project implements a Corrective Retrieval-Augmented Generation (CRAG) architecture using LangGraph.
Unlike standard RAG pipelines that perform only retrieval → generation, CRAG introduces self-evaluation and correction loops to minimize hallucinations and improve answer reliability.

Below is the high-level workflow:

CRAG Architecture


🔄 Workflow Explanation

1️⃣ Retrieve (Hybrid Retrieval)

The user query first passes through a hybrid retriever that combines:

  • BM25 lexical retrieval for keyword matching
  • Chroma vector retrieval (MMR search) for semantic similarity

An Ensemble Retriever merges both results, improving recall and precision.


2️⃣ Relevance Grader

A lightweight LLM evaluates whether the retrieved context is relevant to the query.

  • If relevant → proceed to answer generation
  • If not relevant → trigger web search fallback

This prevents the model from generating answers using weak or unrelated context.


3️⃣ Web Search Fallback

When internal document retrieval fails, the system queries the Tavily Web Search API to fetch external knowledge.
The retrieved web content is inserted as context for answer generation.

This ensures graceful degradation instead of hallucination.


4️⃣ Answer Generation

The answer is generated using an LLM conditioned on:

  • The retrieved context
  • The current user query
  • Recent chat history

The result is appended to persistent conversation memory.


5️⃣ Answer Evaluation (Groundedness & Usefulness)

A second LLM evaluates the generated answer on two criteria:

  • Groundedness → Is the answer supported by the provided context?
  • Usefulness → Does the answer actually address the user’s query?

6️⃣ Corrective Loops

Based on evaluation:

  • If hallucinated (not grounded) → retry answer generation
  • If not useful → rewrite the query → retrieve again
  • If still failing after retries → fallback response

These loops form the corrective core of CRAG.


7️⃣ Persistent Memory

The entire LangGraph state is stored using a SQLite checkpointer, enabling:

  • Multi-turn conversational continuity
  • Thread-based session tracking
  • Memory persistence across backend restarts

Why This Architecture Matters

✔ Reduces LLM hallucinations
✔ Improves retrieval robustness
✔ Handles missing knowledge gracefully
✔ Enables persistent multi-turn chat
✔ Mirrors production-grade agentic RAG systems

This makes the system more reliable than standard RAG pipelines while remaining modular and extensible.


Key Applications

  • Enterprise knowledge-base assistants
  • Research paper and academic assistants
  • Legal and regulatory document assistants
  • Technical and developer documentation copilots
  • Hybrid private + web knowledge assistants

About

HalluciGuard-CRAG — A hybrid-retrieval, self-correcting RAG chatbot built with LangChain, LangGraph, FastAPI, and Streamlit. Combines BM25 + vector search, hallucination detection, query rewriting, and web fallback for grounded, reliable AI answers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages