RAG-based Question-Answering Bot

An AI-powered Question-Answering bot that leverages Large Language Models (LLMs) to answer questions based on document content. Built with FastAPI, LangChain, Docling, and FAISS.

Features

Multi-format Support: Process both PDF and JSON documents
RAG-based QA: Uses Retrieval-Augmented Generation (RAG) for accurate, context-aware answers
Vector Search: Uses LangChain's FAISS vector store (saved to disk) for efficient semantic search
LangChain Integration: Built on LangChain for robust LLM orchestration
Web Interface: User-friendly Streamlit frontend for easy interaction
RESTful API: Clean FastAPI endpoints for programmatic access
Production Ready: Includes tests, error handling, and comprehensive documentation

Technology Stack

Python 3.x: Core programming language
FastAPI: Modern, fast web framework for building APIs
Streamlit: Interactive web interface for easy user interaction
LangChain: Framework for building LLM applications
OpenAI GPT-4o-mini: Language model for generating answers
Docling: Document parsing and chunking (for PDFs)
FAISS: Vector database from LangChain for semantic search (saved to disk)
Pydantic: Data validation

Installation

Prerequisites

Python 3.8 or higher
OpenAI API key

Setup

Clone the repository:

git clone https://github.com/Frostday/RAG-QA.git
cd RAG-QA

Create and activate a conda environment (recommended):

conda create -n rag_qa python=3.12
conda activate rag_qa
pip install -r requirements.txt

Alternatively, use a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

If pip install -r requirements.txt doesn't work, try these alternatives:

Option 1: Use pinned versions file

pip install -r req.txt

Option 2: Use conda with environment.yml

conda env create -f environment.yml
conda activate rag_qa

Option 3: Install packages manually

pip install fastapi uvicorn[standard] python-multipart langchain langchain-openai langchain-core langchain-community langchain-text-splitters openai faiss-cpu docling python-dotenv pydantic pandas pytest pytest-asyncio httpx requests streamlit

Set up environment variables: Create a .env file in the project root directory (same level as src/ and README.md):

OPENAI_API_KEY=your_openai_api_key_here

Important:

Get your API key from OpenAI Platform
The .env file is automatically loaded by src/config.py at startup
Never commit your .env file to version control (it's already in .gitignore)

Quick Start

Using Streamlit (Easiest)

Start the API server:

cd RAG-QA
uvicorn src.app:app --reload

In a new terminal, start Streamlit:

cd RAG-QA
streamlit run src/streamlit_app.py

Open http://localhost:8501 in your browser and upload your files!

Using the API Directly

Start the API server:

cd RAG-QA
uvicorn src.app:app --reload

Use the interactive docs at http://localhost:8000/docs or make API calls programmatically.

Docker Deployment (Recommended for Production)

Get the RAG-QA application running in under 5 minutes using Docker.

Prerequisites

Docker Desktop installed
Docker Compose (usually included with Docker Desktop)
OpenAI API key (Get one here)

Quick Start with Docker

1. Clone and Setup

# Clone the repository
git clone https://github.com/Frostday/RAG-QA.git
cd RAG-QA

# Create .env file with your API key
echo "OPENAI_API_KEY=your_openai_api_key_here" > .env

2. Start the Application

docker-compose up -d

This will:

Build both backend and frontend images
Start the FastAPI backend on port 8000
Start the Streamlit frontend on port 8501
Set up health checks and auto-restart

3. Access the Application

Web Interface: http://localhost:8501
API Documentation: http://localhost:8000/docs
Metrics: http://localhost:8000/metrics

Common Docker Commands

View Logs

# All services
docker-compose logs -f

# Backend only
docker-compose logs -f backend

# Frontend only
docker-compose logs -f frontend

Check Status

docker-compose ps

Stop the Application

docker-compose down

Restart Services

# Restart all
docker-compose restart

# Restart specific service
docker-compose restart backend

Rebuild After Changes

docker-compose build
docker-compose up -d

Docker Services

The docker-compose.yml file defines two services:

backend (FastAPI API server)
- Port: 8000
- Health checks every 30s
- Simple logging with timestamps
- Metrics at /metrics
frontend (Streamlit web interface)
- Port: 8501
- Depends on backend being healthy
- Automatically connects to backend

Docker Features

✅ Multi-stage builds: Optimized image sizes
✅ Health checks: Automatic service health monitoring
✅ Auto-restart: Services restart on failure
✅ Volume persistence: Data persists across restarts
✅ Environment variables: Easy configuration via .env
✅ Isolated networking: Services communicate securely

Development with Docker

For local development with hot-reload:

# Start with logs visible
docker-compose up

# Code changes in ./src will be reflected automatically
# No need to rebuild for code changes

Rebuild after dependency changes:

docker-compose build
docker-compose up

Production Deployment

For production, edit docker-compose.yml and remove volume mounts:

# Comment out these lines in both services:
# volumes:
#   - ./src:/app/src

Then build and deploy:

docker-compose build
docker-compose up -d

Troubleshooting

Container Won't Start

# Check logs
docker-compose logs backend

# Common issue: Invalid API key
# Solution: Check your .env file
cat .env

Port Already in Use

# Check what's using the port
lsof -i :8000  # Backend
lsof -i :8501  # Frontend

# Kill the process or change ports in docker-compose.yml

Out of Memory

# Increase Docker memory limit in Docker Desktop settings
# Preferences > Resources > Memory

Can't Connect to Backend

# Check backend health
curl http://localhost:8000/

# Check if backend is running
docker-compose ps backend

# Restart backend
docker-compose restart backend

Docker Monitoring

View Logs and Metrics

# View logs
docker-compose logs -f backend

# View metrics
curl http://localhost:8000/metrics | jq

# Monitor continuously
watch -n 5 'curl -s http://localhost:8000/metrics | jq'

Check Health

# Backend health
curl http://localhost:8000/

# Frontend health
curl http://localhost:8501/_stcore/health

Docker Cleanup

Remove Containers and Networks

docker-compose down

Remove Volumes (Data)

docker-compose down -v

Remove Images

docker-compose down --rmi all

Complete Cleanup

# Remove everything
docker-compose down -v --rmi all

# Remove any orphaned data
rm -rf data/uploads/* data/vector_stores/*

Docker Support

For issues:

Check logs: docker-compose logs -f
Check metrics: curl http://localhost:8000/metrics
Verify API key: cat .env
Restart services: docker-compose restart

Still having issues? Open an issue on GitHub with:

Output of docker-compose logs
Output of docker-compose ps
Your Docker version: docker --version

Usage

There are two ways to use the Question-Answering Bot:

Streamlit Web Interface (Recommended for beginners)
REST API (For programmatic access and integration)

Option 1: Streamlit Web Interface

Starting the Streamlit App

Start the FastAPI server (required backend):

cd RAG-QA
uvicorn src.app:app --reload

In a new terminal, start the Streamlit app:

cd RAG-QA
streamlit run src/streamlit_app.py

Open your browser to http://localhost:8501

Using the Streamlit Interface

Upload Document: Click "Choose document file" and select a PDF or JSON file
Upload Questions: Click "Choose questions file" and select a JSON file with questions
Process: Click the "🚀 Process Documents" button
View Results: Answers will be displayed below, and you can download them as JSON

Questions File Format:

List format: ["question1", "question2", ...]
Object format: {"questions": ["question1", "question2", ...]}

Features:

✅ Limits displayed upfront: File size (50 MB), questions (100 max), timeout (5 min)
✅ Real-time validation: File size and question count checked before submission
✅ Error handling tips: Expandable section with common issues and solutions
✅ File validation and preview: Shows file size, question count, and preview
✅ Real-time processing status: Loading spinner with progress indication
✅ Download answers as JSON: Export results for later use
✅ Friendly error messages: Clear explanations when something goes wrong
✅ Configurable API URL: Easy to point to different backend instances

Option 2: REST API

Starting the API Server

Run the FastAPI application:

cd RAG-QA
uvicorn src.app:app --reload

The API will be available at http://localhost:8000

API Documentation

Once the server is running, you can access:

Interactive API docs: http://localhost:8000/docs
Alternative docs: http://localhost:8000/redoc

API Endpoints

Root / Health Check

GET /

Returns API information, status, and configuration limits.

Response:

{
  "name": "Question-Answering Bot API",
  "version": "1.0.0",
  "status": "operational",
  "limits": {
    "max_file_size_mb": 50,
    "max_questions_per_request": 100
  },
  "supported_formats": {
    "documents": ["PDF", "JSON"],
    "questions": ["JSON"]
  }
}

Process Documents (Main Endpoint)

POST /process-documents

Upload both document and questions file, get answers in one request.

Request:

document: File (PDF or JSON) - Required
questions_file: JSON file containing list of questions - Required

Questions File Format:

Option 1 - List format:

[
  "Question 1?",
  "Question 2?",
  "Question 3?"
]

Option 2 - Object format:

{
  "questions": [
    "Question 1?",
    "Question 2?"
  ]
}

Response:

{
  "Question 1?": "Answer 1",
  "Question 2?": "Answer 2"
}

cURL Example:

curl -X POST "http://localhost:8000/process-documents" \
  -F "document=@path/to/document.pdf" \
  -F "questions_file=@path/to/questions.json"

Python Example:

import requests

files = {
    "document": ("document.pdf", open("document.pdf", "rb"), "application/pdf"),
    "questions_file": ("questions.json", open("questions.json", "rb"), "application/json")
}

response = requests.post("http://localhost:8000/process-documents", files=files)
answers = response.json()
print(answers)

JavaScript/Node.js Example:

const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');

const form = new FormData();
form.append('document', fs.createReadStream('document.pdf'));
form.append('questions_file', fs.createReadStream('questions.json'));

axios.post('http://localhost:8000/process-documents', form, {
  headers: form.getHeaders()
})
.then(response => {
  console.log(response.data);
})
.catch(error => {
  console.error(error);
});

Project Structure

RAG-QA/
├── src/                   # Source code
│   ├── __init__.py
│   ├── config.py         # Centralized configuration settings
│   ├── app.py            # FastAPI application and endpoints
│   ├── streamlit_app.py  # Streamlit web interface
│   ├── document_indexer.py  # Document indexing service (PDF/JSON)
│   └── qa_service.py     # QA service using LangChain
├── tests/                 # Test files (48 tests total)
│   ├── __init__.py
│   ├── test_api.py       # Integration tests (19 tests)
│   ├── test_document_indexer.py  # Document processing unit tests (16 tests)
│   └── test_qa_service.py        # QA service unit tests (13 tests)
├── data/                  # Data directory
│   ├── uploads/          # Temporary upload storage
│   └── vector_stores/    # FAISS vector store storage
├── requirements.txt      # Python dependencies
├── README.md             # This file
└── .env                  # Environment variables (create this)

Architecture Highlights

The application follows clean architecture principles with clear separation of concerns:

Configuration Layer: Centralized settings in config.py
API Layer: FastAPI endpoints in app.py for request handling and orchestration
Business Logic Layer: Separate services for document indexing and QA
Presentation Layer: Streamlit web interface and FastAPI docs
Infrastructure Layer: OpenAI, FAISS, and file system integrations

How It Works

1. Document Upload and Processing

When a document is uploaded, it's processed based on its type:

PDF Documents

Parsing: Uses Docling's DocumentConverter to parse PDF into a structured document representation
Chunking: Uses Docling's HybridChunker with merge_peers=True for intelligent, structure-aware chunking
- Preserves document layout, tables, headings, and formatting
- Chunks are created based on semantic and structural boundaries (not arbitrary text splits)
- Each chunk maintains metadata including:
  - Page numbers where content appears
  - Section headings/titles
  - Chunk index for ordering
Advantages: Better context preservation, especially for documents with tables, multi-column layouts, and complex structures

JSON Documents

Parsing: JSON is parsed and converted to a readable text representation
Chunking: Uses structure-aware chunking to preserve JSON semantics:
- For JSON arrays: Each list item becomes a potential chunk (if small enough)
- For JSON objects: Entire object is kept as a single chunk if small, or split if large
- Large chunks: If a chunk exceeds 1000 characters, it's further split using LangChain's RecursiveCharacterTextSplitter with:
  - chunk_size=1000 characters
  - chunk_overlap=200 characters (to preserve context across chunk boundaries)
- Each chunk includes metadata:
  - File type identifier
  - Chunk index
  - List index (for array items)
Advantages: Preserves JSON structure and relationships while ensuring chunks are appropriately sized for embedding

2. Embedding and Vector Storage

Embedding Model: Both PDF and JSON chunks are embedded using OpenAI's text-embedding-3-small model
- This model converts text chunks into high-dimensional vectors (embeddings)
- Embeddings capture semantic meaning, allowing similar content to be found even with different wording
Vector Store: Embeddings are stored in a FAISS (Facebook AI Similarity Search) vector store
- FAISS enables fast similarity search across all document chunks
- Vector stores are saved to disk and can be loaded for subsequent queries
- Each vector store is uniquely identified by a session ID

3. Question Answering (RAG Process)

When questions are submitted:

Semantic Search:
- The question is embedded using the same text-embedding-3-small model
- FAISS performs similarity search to find the most relevant document chunks
- By default, retrieves the top k=5 most relevant chunks (configurable)
Context Assembly:
- Retrieved chunks are combined into a context string
- This context is passed to the LLM along with the question
Answer Generation:
- GPT-4o-mini generates the answer based on the retrieved context
- The LLM can infer information from context while being transparent about what's directly stated vs. inferred
- If context is incomplete, the model indicates this in the response

This RAG (Retrieval-Augmented Generation) approach ensures answers are grounded in the actual document content while leveraging the LLM's reasoning capabilities.

Testing

The application has comprehensive test coverage with 48 tests across unit and integration testing.

Quick Start

cd RAG-QA

# Run all tests
pytest tests/ -v

# Run specific test files
pytest tests/test_api.py -v              # Integration tests (19 tests)
pytest tests/test_document_indexer.py -v # Document processing unit tests (16 tests)
pytest tests/test_qa_service.py -v       # QA service unit tests (13 tests)

# Run with coverage report
pytest tests/ --cov=src --cov-report=html

Test Coverage

Test File	Tests	Type	Coverage
`test_api.py`	19	Integration	API endpoints with mocked LLM
`test_document_indexer.py`	16	Unit	Document processing & chunking (JSON + PDF)
`test_qa_service.py`	13	Unit	Question answering & retrieval
Total	48	Mixed	Comprehensive

Key Features:

✅ All external dependencies mocked (OpenAI, FAISS)
✅ No real API calls required
✅ Fast execution (<10 seconds)
✅ Edge cases and error scenarios covered
✅ Async operations tested
✅ Integration tests with complete workflow

Configuration

Centralized Configuration File

All application settings are now centralized in src/config.py for easy maintenance and consistency across all components.

How it works:

src/config.py loads environment variables from .env file using load_dotenv()
All other modules (app.py, streamlit_app.py, document_indexer.py, qa_service.py) import settings from config.py
This ensures consistent configuration across all components

Environment Variables

Required Setup:

Create a .env file in the project root directory (RAG-QA/.env):

OPENAI_API_KEY=your_openai_api_key_here

Where to get your OpenAI API key:

Sign up or log in at OpenAI Platform
Navigate to API Keys
Click "Create new secret key"
Copy the key and paste it in your .env file

Security Notes:

The .env file is automatically ignored by Git (listed in .gitignore)
Never share or commit your API key
The API key is loaded once at startup by config.py

Verify Configuration:

The application will automatically validate the API key on startup. If the API key is missing, you'll see a clear error message with instructions.

You can also run the configuration tests:

pytest tests/test_config.py -v

If you encounter issues, check:

.env file exists in the project root (not in src/)
.env file contains: OPENAI_API_KEY=your_actual_key_here
No spaces around the = sign
No quotes around the API key value

Adjustable Parameters in `src/config.py`

Limits & Constraints:

MAX_FILE_SIZE_MB: Maximum document file size (default: 50 MB)
MAX_QUESTIONS: Maximum questions per request (default: 100)
REQUEST_TIMEOUT_SECONDS: Client timeout for API requests (default: 300 seconds / 5 minutes)

OpenAI Settings:

LLM_MODEL: Language model name (default: "gpt-4o-mini")
LLM_TEMPERATURE: LLM temperature for response generation (default: 0)
EMBEDDING_MODEL: Embedding model for vectorization (default: "text-embedding-3-small")

Document Processing:

RETRIEVAL_K: Number of chunks to retrieve for context (default: 5)
SIMILARITY_THRESHOLD: Minimum similarity score for retrieved chunks (default: 0.4)
- Filters out irrelevant content to avoid unnecessary LLM calls
- Lower values (0.3) are more lenient and include marginal matches
- Moderate values (0.4-0.5) provide balanced filtering
- Higher values (0.6-0.7) are stricter and only include highly relevant content
JSON_CHUNK_SIZE: Maximum size for text chunks (default: 1000 characters)
JSON_CHUNK_OVERLAP: Overlap between chunks (default: 200 characters)
PDF_MERGE_PEERS: Merge peer elements in PDF chunking (default: True)

Supported Formats:

SUPPORTED_DOCUMENT_FORMATS: List of accepted document types (default: [".pdf", ".json"])
SUPPORTED_QUESTIONS_FORMAT: List of accepted question file types (default: [".json"])

API Configuration:

API_HOST: API server host (default: "localhost")
API_PORT: API server port (default: 8000)
API_URL: Full API URL (default: "http://localhost:8000")

Error Messages:

ERROR_MESSAGES: Dictionary of all error message templates for consistent messaging

All services (app.py, streamlit_app.py, document_indexer.py, qa_service.py) import their settings from this centralized config file, ensuring consistency across the application.

Error Handling

The API includes comprehensive error handling with friendly, actionable error messages:

Validation Errors (400 Bad Request)

Invalid file types (must be PDF or JSON for documents, JSON for questions)
Invalid JSON format or structure
Empty files or empty questions list
Files exceeding size limits (50 MB maximum)
Too many questions (100 maximum per request)
Missing required fields

Resource Errors

413 Payload Too Large: File exceeds 50 MB limit
504 Gateway Timeout: Processing took too long (document too complex)
507 Insufficient Storage: Out of memory (document too large)

Service Errors

500 Internal Server Error: OpenAI API errors, corrupted PDFs, or other processing errors
503 Service Unavailable: Network connection issues

All errors include descriptive messages to help diagnose and resolve issues.

Observability

The application includes simple logging and metrics tracking for monitoring and debugging:

Logging

Simple console logging with timestamps for key events:

2026-01-05 11:44:02 - INFO - app - Application starting - Question-Answering Bot API v1.0.0
2026-01-05 11:44:02 - INFO - app - Limits: max_file_size=50MB, max_questions=100
2026-01-05 11:45:12 - INFO - app - Processing request: doc=report.pdf, size=2.5MB, questions=10, session=abc-123
2026-01-05 11:45:14 - INFO - app - Document indexed: 45 chunks in 2.341s
2026-01-05 11:45:18 - INFO - app - Request completed: 10 questions answered in 3.2s (total: 5.541s)

Log Levels:

INFO: Normal operations (startup, requests, completions)
WARNING: Non-critical issues (cleanup failures)
ERROR: Processing errors, API failures, exceptions

Viewing Logs:

# Local development
uvicorn src.app:app --reload

# Docker
docker-compose logs -f backend

Metrics Endpoint

Access real-time metrics at /metrics:

curl http://localhost:8000/metrics

Example Response:

{
  "requests_total": 42,
  "requests_success": 40,
  "requests_failed": 2,
  "documents_processed": 42,
  "documents_processed_pdf": 30,
  "documents_processed_json": 12,
  "questions_answered": 315,
  "total_tokens_used": 0,
  "total_latency_seconds": 245.678,
  "total_chunks_created": 1250,
  "avg_latency_seconds": 5.849,
  "success_rate": 0.952,
  "timestamp": "2026-01-05T12:34:56.789Z"
}

Tracked Metrics:

Request counts (total, success, failed)
Success rate
Average and total latency
Documents processed by type
Questions answered
Token usage (if available from OpenAI)
Chunk counts

Monitoring

View logs and metrics:

# View logs in real-time
docker-compose logs -f backend

# Monitor metrics
watch -n 5 'curl -s http://localhost:8000/metrics | jq'

# Check success rate
curl -s http://localhost:8000/metrics | jq '.success_rate'

# Check average latency
curl -s http://localhost:8000/metrics | jq '.avg_latency_seconds'

Integration with monitoring systems:

View logs in Docker logs or redirect to log files
Scrape /metrics with Prometheus or Datadog
Create dashboards for visualization
Set up alerts on error rates or latency thresholds

Limitations & Safeguards

Built-in Limits

Maximum file size: 50 MB per document (configurable)
Maximum questions: 100 questions per request (configurable)
Request timeout: 300 seconds (5 minutes) for Streamlit client
Empty files: Rejected with clear error message

Performance Considerations

Processing large PDFs may take time (especially 30+ MB files)
Complex PDFs with many images or tables require more processing time
Concurrent question answering improves throughput for multiple questions

Requirements

Requires OpenAI API key and internet connection
FAISS vector stores are stored locally on disk
Sufficient disk space for temporary files and vector stores

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
tests		tests
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
req.txt		req.txt
requirements.txt		requirements.txt

Frostday/RAG-QA

Folders and files

Latest commit

History

Repository files navigation

RAG-based Question-Answering Bot

Features

Technology Stack

Installation

Prerequisites

Setup

Quick Start

Using Streamlit (Easiest)

Using the API Directly

Docker Deployment (Recommended for Production)

Prerequisites

Quick Start with Docker

Common Docker Commands

Docker Services

Docker Features

Development with Docker

Production Deployment

Troubleshooting

Docker Monitoring

Docker Cleanup

Docker Support

Usage

Option 1: Streamlit Web Interface

Starting the Streamlit App

Using the Streamlit Interface

Option 2: REST API

Starting the API Server

API Documentation

API Endpoints

Root / Health Check

Process Documents (Main Endpoint)

Project Structure

Architecture Highlights

How It Works

1. Document Upload and Processing

PDF Documents

JSON Documents

2. Embedding and Vector Storage

3. Question Answering (RAG Process)

Testing

Quick Start

Test Coverage

Configuration

Centralized Configuration File

Environment Variables

Adjustable Parameters in src/config.py

Error Handling

Validation Errors (400 Bad Request)

Resource Errors

Service Errors

Observability

Logging

Metrics Endpoint

Monitoring

Limitations & Safeguards

Built-in Limits

Performance Considerations

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Adjustable Parameters in `src/config.py`

Packages