An AI-powered Question-Answering bot that leverages Large Language Models (LLMs) to answer questions based on document content. Built with FastAPI, LangChain, Docling, and FAISS.
- Multi-format Support: Process both PDF and JSON documents
- RAG-based QA: Uses Retrieval-Augmented Generation (RAG) for accurate, context-aware answers
- Vector Search: Uses LangChain's FAISS vector store (saved to disk) for efficient semantic search
- LangChain Integration: Built on LangChain for robust LLM orchestration
- Web Interface: User-friendly Streamlit frontend for easy interaction
- RESTful API: Clean FastAPI endpoints for programmatic access
- Production Ready: Includes tests, error handling, and comprehensive documentation
- Python 3.x: Core programming language
- FastAPI: Modern, fast web framework for building APIs
- Streamlit: Interactive web interface for easy user interaction
- LangChain: Framework for building LLM applications
- OpenAI GPT-4o-mini: Language model for generating answers
- Docling: Document parsing and chunking (for PDFs)
- FAISS: Vector database from LangChain for semantic search (saved to disk)
- Pydantic: Data validation
- Python 3.8 or higher
- OpenAI API key
- Clone the repository:
git clone https://github.com/Frostday/RAG-QA.git
cd RAG-QA- Create and activate a conda environment (recommended):
conda create -n rag_qa python=3.12
conda activate rag_qa
pip install -r requirements.txtAlternatively, use a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtIf pip install -r requirements.txt doesn't work, try these alternatives:
Option 1: Use pinned versions file
pip install -r req.txtOption 2: Use conda with environment.yml
conda env create -f environment.yml
conda activate rag_qaOption 3: Install packages manually
pip install fastapi uvicorn[standard] python-multipart langchain langchain-openai langchain-core langchain-community langchain-text-splitters openai faiss-cpu docling python-dotenv pydantic pandas pytest pytest-asyncio httpx requests streamlit- Set up environment variables:
Create a
.envfile in the project root directory (same level assrc/andREADME.md):
OPENAI_API_KEY=your_openai_api_key_hereImportant:
- Get your API key from OpenAI Platform
- The
.envfile is automatically loaded bysrc/config.pyat startup - Never commit your
.envfile to version control (it's already in.gitignore)
- Start the API server:
cd RAG-QA
uvicorn src.app:app --reload- In a new terminal, start Streamlit:
cd RAG-QA
streamlit run src/streamlit_app.py- Open
http://localhost:8501in your browser and upload your files!
- Start the API server:
cd RAG-QA
uvicorn src.app:app --reload- Use the interactive docs at
http://localhost:8000/docsor make API calls programmatically.
Get the RAG-QA application running in under 5 minutes using Docker.
- Docker Desktop installed
- Docker Compose (usually included with Docker Desktop)
- OpenAI API key (Get one here)
1. Clone and Setup
# Clone the repository
git clone https://github.com/Frostday/RAG-QA.git
cd RAG-QA
# Create .env file with your API key
echo "OPENAI_API_KEY=your_openai_api_key_here" > .env2. Start the Application
docker-compose up -dThis will:
- Build both backend and frontend images
- Start the FastAPI backend on port 8000
- Start the Streamlit frontend on port 8501
- Set up health checks and auto-restart
3. Access the Application
- Web Interface: http://localhost:8501
- API Documentation: http://localhost:8000/docs
- Metrics: http://localhost:8000/metrics
View Logs
# All services
docker-compose logs -f
# Backend only
docker-compose logs -f backend
# Frontend only
docker-compose logs -f frontendCheck Status
docker-compose psStop the Application
docker-compose downRestart Services
# Restart all
docker-compose restart
# Restart specific service
docker-compose restart backendRebuild After Changes
docker-compose build
docker-compose up -dThe docker-compose.yml file defines two services:
-
backend (FastAPI API server)
- Port: 8000
- Health checks every 30s
- Simple logging with timestamps
- Metrics at
/metrics
-
frontend (Streamlit web interface)
- Port: 8501
- Depends on backend being healthy
- Automatically connects to backend
- ✅ Multi-stage builds: Optimized image sizes
- ✅ Health checks: Automatic service health monitoring
- ✅ Auto-restart: Services restart on failure
- ✅ Volume persistence: Data persists across restarts
- ✅ Environment variables: Easy configuration via
.env - ✅ Isolated networking: Services communicate securely
For local development with hot-reload:
# Start with logs visible
docker-compose up
# Code changes in ./src will be reflected automatically
# No need to rebuild for code changesRebuild after dependency changes:
docker-compose build
docker-compose upFor production, edit docker-compose.yml and remove volume mounts:
# Comment out these lines in both services:
# volumes:
# - ./src:/app/srcThen build and deploy:
docker-compose build
docker-compose up -dContainer Won't Start
# Check logs
docker-compose logs backend
# Common issue: Invalid API key
# Solution: Check your .env file
cat .envPort Already in Use
# Check what's using the port
lsof -i :8000 # Backend
lsof -i :8501 # Frontend
# Kill the process or change ports in docker-compose.ymlOut of Memory
# Increase Docker memory limit in Docker Desktop settings
# Preferences > Resources > MemoryCan't Connect to Backend
# Check backend health
curl http://localhost:8000/
# Check if backend is running
docker-compose ps backend
# Restart backend
docker-compose restart backendView Logs and Metrics
# View logs
docker-compose logs -f backend
# View metrics
curl http://localhost:8000/metrics | jq
# Monitor continuously
watch -n 5 'curl -s http://localhost:8000/metrics | jq'Check Health
# Backend health
curl http://localhost:8000/
# Frontend health
curl http://localhost:8501/_stcore/healthRemove Containers and Networks
docker-compose downRemove Volumes (Data)
docker-compose down -vRemove Images
docker-compose down --rmi allComplete Cleanup
# Remove everything
docker-compose down -v --rmi all
# Remove any orphaned data
rm -rf data/uploads/* data/vector_stores/*For issues:
- Check logs:
docker-compose logs -f - Check metrics:
curl http://localhost:8000/metrics - Verify API key:
cat .env - Restart services:
docker-compose restart
Still having issues? Open an issue on GitHub with:
- Output of
docker-compose logs - Output of
docker-compose ps - Your Docker version:
docker --version
There are two ways to use the Question-Answering Bot:
- Streamlit Web Interface (Recommended for beginners)
- REST API (For programmatic access and integration)
- Start the FastAPI server (required backend):
cd RAG-QA
uvicorn src.app:app --reload- In a new terminal, start the Streamlit app:
cd RAG-QA
streamlit run src/streamlit_app.py- Open your browser to
http://localhost:8501
- Upload Document: Click "Choose document file" and select a PDF or JSON file
- Upload Questions: Click "Choose questions file" and select a JSON file with questions
- Process: Click the "🚀 Process Documents" button
- View Results: Answers will be displayed below, and you can download them as JSON
Questions File Format:
- List format:
["question1", "question2", ...] - Object format:
{"questions": ["question1", "question2", ...]}
Features:
- ✅ Limits displayed upfront: File size (50 MB), questions (100 max), timeout (5 min)
- ✅ Real-time validation: File size and question count checked before submission
- ✅ Error handling tips: Expandable section with common issues and solutions
- ✅ File validation and preview: Shows file size, question count, and preview
- ✅ Real-time processing status: Loading spinner with progress indication
- ✅ Download answers as JSON: Export results for later use
- ✅ Friendly error messages: Clear explanations when something goes wrong
- ✅ Configurable API URL: Easy to point to different backend instances
Run the FastAPI application:
cd RAG-QA
uvicorn src.app:app --reloadThe API will be available at http://localhost:8000
Once the server is running, you can access:
- Interactive API docs:
http://localhost:8000/docs - Alternative docs:
http://localhost:8000/redoc
GET /Returns API information, status, and configuration limits.
Response:
{
"name": "Question-Answering Bot API",
"version": "1.0.0",
"status": "operational",
"limits": {
"max_file_size_mb": 50,
"max_questions_per_request": 100
},
"supported_formats": {
"documents": ["PDF", "JSON"],
"questions": ["JSON"]
}
}POST /process-documentsUpload both document and questions file, get answers in one request.
Request:
document: File (PDF or JSON) - Requiredquestions_file: JSON file containing list of questions - Required
Questions File Format:
Option 1 - List format:
[
"Question 1?",
"Question 2?",
"Question 3?"
]Option 2 - Object format:
{
"questions": [
"Question 1?",
"Question 2?"
]
}Response:
{
"Question 1?": "Answer 1",
"Question 2?": "Answer 2"
}cURL Example:
curl -X POST "http://localhost:8000/process-documents" \
-F "document=@path/to/document.pdf" \
-F "questions_file=@path/to/questions.json"Python Example:
import requests
files = {
"document": ("document.pdf", open("document.pdf", "rb"), "application/pdf"),
"questions_file": ("questions.json", open("questions.json", "rb"), "application/json")
}
response = requests.post("http://localhost:8000/process-documents", files=files)
answers = response.json()
print(answers)JavaScript/Node.js Example:
const FormData = require('form-data');
const fs = require('fs');
const axios = require('axios');
const form = new FormData();
form.append('document', fs.createReadStream('document.pdf'));
form.append('questions_file', fs.createReadStream('questions.json'));
axios.post('http://localhost:8000/process-documents', form, {
headers: form.getHeaders()
})
.then(response => {
console.log(response.data);
})
.catch(error => {
console.error(error);
});RAG-QA/
├── src/ # Source code
│ ├── __init__.py
│ ├── config.py # Centralized configuration settings
│ ├── app.py # FastAPI application and endpoints
│ ├── streamlit_app.py # Streamlit web interface
│ ├── document_indexer.py # Document indexing service (PDF/JSON)
│ └── qa_service.py # QA service using LangChain
├── tests/ # Test files (48 tests total)
│ ├── __init__.py
│ ├── test_api.py # Integration tests (19 tests)
│ ├── test_document_indexer.py # Document processing unit tests (16 tests)
│ └── test_qa_service.py # QA service unit tests (13 tests)
├── data/ # Data directory
│ ├── uploads/ # Temporary upload storage
│ └── vector_stores/ # FAISS vector store storage
├── requirements.txt # Python dependencies
├── README.md # This file
└── .env # Environment variables (create this)
The application follows clean architecture principles with clear separation of concerns:
- Configuration Layer: Centralized settings in
config.py - API Layer: FastAPI endpoints in
app.pyfor request handling and orchestration - Business Logic Layer: Separate services for document indexing and QA
- Presentation Layer: Streamlit web interface and FastAPI docs
- Infrastructure Layer: OpenAI, FAISS, and file system integrations
When a document is uploaded, it's processed based on its type:
- Parsing: Uses Docling's
DocumentConverterto parse PDF into a structured document representation - Chunking: Uses Docling's
HybridChunkerwithmerge_peers=Truefor intelligent, structure-aware chunking- Preserves document layout, tables, headings, and formatting
- Chunks are created based on semantic and structural boundaries (not arbitrary text splits)
- Each chunk maintains metadata including:
- Page numbers where content appears
- Section headings/titles
- Chunk index for ordering
- Advantages: Better context preservation, especially for documents with tables, multi-column layouts, and complex structures
- Parsing: JSON is parsed and converted to a readable text representation
- Chunking: Uses structure-aware chunking to preserve JSON semantics:
- For JSON arrays: Each list item becomes a potential chunk (if small enough)
- For JSON objects: Entire object is kept as a single chunk if small, or split if large
- Large chunks: If a chunk exceeds 1000 characters, it's further split using LangChain's
RecursiveCharacterTextSplitterwith:chunk_size=1000characterschunk_overlap=200characters (to preserve context across chunk boundaries)
- Each chunk includes metadata:
- File type identifier
- Chunk index
- List index (for array items)
- Advantages: Preserves JSON structure and relationships while ensuring chunks are appropriately sized for embedding
- Embedding Model: Both PDF and JSON chunks are embedded using OpenAI's
text-embedding-3-smallmodel- This model converts text chunks into high-dimensional vectors (embeddings)
- Embeddings capture semantic meaning, allowing similar content to be found even with different wording
- Vector Store: Embeddings are stored in a FAISS (Facebook AI Similarity Search) vector store
- FAISS enables fast similarity search across all document chunks
- Vector stores are saved to disk and can be loaded for subsequent queries
- Each vector store is uniquely identified by a session ID
When questions are submitted:
-
Semantic Search:
- The question is embedded using the same
text-embedding-3-smallmodel - FAISS performs similarity search to find the most relevant document chunks
- By default, retrieves the top
k=5most relevant chunks (configurable)
- The question is embedded using the same
-
Context Assembly:
- Retrieved chunks are combined into a context string
- This context is passed to the LLM along with the question
-
Answer Generation:
- GPT-4o-mini generates the answer based on the retrieved context
- The LLM can infer information from context while being transparent about what's directly stated vs. inferred
- If context is incomplete, the model indicates this in the response
This RAG (Retrieval-Augmented Generation) approach ensures answers are grounded in the actual document content while leveraging the LLM's reasoning capabilities.
The application has comprehensive test coverage with 48 tests across unit and integration testing.
cd RAG-QA
# Run all tests
pytest tests/ -v
# Run specific test files
pytest tests/test_api.py -v # Integration tests (19 tests)
pytest tests/test_document_indexer.py -v # Document processing unit tests (16 tests)
pytest tests/test_qa_service.py -v # QA service unit tests (13 tests)
# Run with coverage report
pytest tests/ --cov=src --cov-report=html| Test File | Tests | Type | Coverage |
|---|---|---|---|
test_api.py |
19 | Integration | API endpoints with mocked LLM |
test_document_indexer.py |
16 | Unit | Document processing & chunking (JSON + PDF) |
test_qa_service.py |
13 | Unit | Question answering & retrieval |
| Total | 48 | Mixed | Comprehensive |
Key Features:
- ✅ All external dependencies mocked (OpenAI, FAISS)
- ✅ No real API calls required
- ✅ Fast execution (<10 seconds)
- ✅ Edge cases and error scenarios covered
- ✅ Async operations tested
- ✅ Integration tests with complete workflow
All application settings are now centralized in src/config.py for easy maintenance and consistency across all components.
How it works:
src/config.pyloads environment variables from.envfile usingload_dotenv()- All other modules (
app.py,streamlit_app.py,document_indexer.py,qa_service.py) import settings fromconfig.py - This ensures consistent configuration across all components
Required Setup:
Create a .env file in the project root directory (RAG-QA/.env):
OPENAI_API_KEY=your_openai_api_key_hereWhere to get your OpenAI API key:
- Sign up or log in at OpenAI Platform
- Navigate to API Keys
- Click "Create new secret key"
- Copy the key and paste it in your
.envfile
Security Notes:
- The
.envfile is automatically ignored by Git (listed in.gitignore) - Never share or commit your API key
- The API key is loaded once at startup by
config.py
Verify Configuration:
The application will automatically validate the API key on startup. If the API key is missing, you'll see a clear error message with instructions.
You can also run the configuration tests:
pytest tests/test_config.py -vIf you encounter issues, check:
.envfile exists in the project root (not insrc/).envfile contains:OPENAI_API_KEY=your_actual_key_here- No spaces around the
=sign - No quotes around the API key value
Limits & Constraints:
MAX_FILE_SIZE_MB: Maximum document file size (default: 50 MB)MAX_QUESTIONS: Maximum questions per request (default: 100)REQUEST_TIMEOUT_SECONDS: Client timeout for API requests (default: 300 seconds / 5 minutes)
OpenAI Settings:
LLM_MODEL: Language model name (default: "gpt-4o-mini")LLM_TEMPERATURE: LLM temperature for response generation (default: 0)EMBEDDING_MODEL: Embedding model for vectorization (default: "text-embedding-3-small")
Document Processing:
RETRIEVAL_K: Number of chunks to retrieve for context (default: 5)SIMILARITY_THRESHOLD: Minimum similarity score for retrieved chunks (default: 0.4)- Filters out irrelevant content to avoid unnecessary LLM calls
- Lower values (0.3) are more lenient and include marginal matches
- Moderate values (0.4-0.5) provide balanced filtering
- Higher values (0.6-0.7) are stricter and only include highly relevant content
JSON_CHUNK_SIZE: Maximum size for text chunks (default: 1000 characters)JSON_CHUNK_OVERLAP: Overlap between chunks (default: 200 characters)PDF_MERGE_PEERS: Merge peer elements in PDF chunking (default: True)
Supported Formats:
SUPPORTED_DOCUMENT_FORMATS: List of accepted document types (default: [".pdf", ".json"])SUPPORTED_QUESTIONS_FORMAT: List of accepted question file types (default: [".json"])
API Configuration:
API_HOST: API server host (default: "localhost")API_PORT: API server port (default: 8000)API_URL: Full API URL (default: "http://localhost:8000")
Error Messages:
ERROR_MESSAGES: Dictionary of all error message templates for consistent messaging
All services (app.py, streamlit_app.py, document_indexer.py, qa_service.py) import their settings from this centralized config file, ensuring consistency across the application.
The API includes comprehensive error handling with friendly, actionable error messages:
- Invalid file types (must be PDF or JSON for documents, JSON for questions)
- Invalid JSON format or structure
- Empty files or empty questions list
- Files exceeding size limits (50 MB maximum)
- Too many questions (100 maximum per request)
- Missing required fields
- 413 Payload Too Large: File exceeds 50 MB limit
- 504 Gateway Timeout: Processing took too long (document too complex)
- 507 Insufficient Storage: Out of memory (document too large)
- 500 Internal Server Error: OpenAI API errors, corrupted PDFs, or other processing errors
- 503 Service Unavailable: Network connection issues
All errors include descriptive messages to help diagnose and resolve issues.
The application includes simple logging and metrics tracking for monitoring and debugging:
Simple console logging with timestamps for key events:
2026-01-05 11:44:02 - INFO - app - Application starting - Question-Answering Bot API v1.0.0
2026-01-05 11:44:02 - INFO - app - Limits: max_file_size=50MB, max_questions=100
2026-01-05 11:45:12 - INFO - app - Processing request: doc=report.pdf, size=2.5MB, questions=10, session=abc-123
2026-01-05 11:45:14 - INFO - app - Document indexed: 45 chunks in 2.341s
2026-01-05 11:45:18 - INFO - app - Request completed: 10 questions answered in 3.2s (total: 5.541s)
Log Levels:
- INFO: Normal operations (startup, requests, completions)
- WARNING: Non-critical issues (cleanup failures)
- ERROR: Processing errors, API failures, exceptions
Viewing Logs:
# Local development
uvicorn src.app:app --reload
# Docker
docker-compose logs -f backendAccess real-time metrics at /metrics:
curl http://localhost:8000/metricsExample Response:
{
"requests_total": 42,
"requests_success": 40,
"requests_failed": 2,
"documents_processed": 42,
"documents_processed_pdf": 30,
"documents_processed_json": 12,
"questions_answered": 315,
"total_tokens_used": 0,
"total_latency_seconds": 245.678,
"total_chunks_created": 1250,
"avg_latency_seconds": 5.849,
"success_rate": 0.952,
"timestamp": "2026-01-05T12:34:56.789Z"
}Tracked Metrics:
- Request counts (total, success, failed)
- Success rate
- Average and total latency
- Documents processed by type
- Questions answered
- Token usage (if available from OpenAI)
- Chunk counts
View logs and metrics:
# View logs in real-time
docker-compose logs -f backend
# Monitor metrics
watch -n 5 'curl -s http://localhost:8000/metrics | jq'
# Check success rate
curl -s http://localhost:8000/metrics | jq '.success_rate'
# Check average latency
curl -s http://localhost:8000/metrics | jq '.avg_latency_seconds'Integration with monitoring systems:
- View logs in Docker logs or redirect to log files
- Scrape
/metricswith Prometheus or Datadog - Create dashboards for visualization
- Set up alerts on error rates or latency thresholds
- Maximum file size: 50 MB per document (configurable)
- Maximum questions: 100 questions per request (configurable)
- Request timeout: 300 seconds (5 minutes) for Streamlit client
- Empty files: Rejected with clear error message
- Processing large PDFs may take time (especially 30+ MB files)
- Complex PDFs with many images or tables require more processing time
- Concurrent question answering improves throughput for multiple questions
- Requires OpenAI API key and internet connection
- FAISS vector stores are stored locally on disk
- Sufficient disk space for temporary files and vector stores