Skip to content

gokhaneraslan/advanced_rag

Repository files navigation

RAG Pipeline API with Conversation Memory

CI/CD Python 3.11

A production-ready Retrieval-Augmented Generation (RAG) system with conversation memory, built with LangChain, FastAPI, and modern MLOps practices.

🌟 Features

Core Features

  • Multi-Provider LLM Support: OpenAI, Google Gemini, Groq, and Ollama
  • Advanced Retrieval: Hybrid search (vector + BM25) with reranking
  • Conversation Memory: SQLite-based persistent conversation history
  • Document Processing: Support for TXT and PDF files
  • Semantic & Recursive Chunking: Flexible text splitting strategies
  • RESTful API: FastAPI with automatic OpenAPI documentation

Production Features

  • βœ… Comprehensive Logging: Structured logging with rotation
  • βœ… Error Handling: Graceful error handling and recovery
  • βœ… Health Checks: Kubernetes-ready health endpoints
  • βœ… Rate Limiting: API rate limiting to prevent abuse
  • βœ… CORS Support: Configurable CORS for web applications
  • βœ… Docker Support: Multi-stage builds with health checks
  • βœ… CI/CD Pipeline: Automated testing, security scanning, and deployment
  • βœ… Configuration Management: Environment-based configuration

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   User Query    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FastAPI Server β”‚
β”‚  - Rate Limit   β”‚
β”‚  - CORS         β”‚
β”‚  - Logging      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Conversation Memory       β”‚
β”‚   (SQLite)                  β”‚
β”‚   - Session Management      β”‚
β”‚   - History Retrieval       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
              v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Document Retrieval        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚ Vector Search (Chroma)β”‚  β”‚
β”‚   β”‚ + BM25 (Keyword)      β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚   β”‚ Ensemble + Reranking β”‚  β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
              v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   LLM (Gemini/OpenAI/etc)   β”‚
β”‚   - Context + History       β”‚
β”‚   - Answer Generation       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
              v
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Response + Memory Update  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“¦ Installation

Prerequisites

  • Python 3.11+
  • Docker & Docker Compose (optional)
  • API keys for your chosen LLM provider

Local Setup

  1. Clone the repository
git clone https://github.com/gokhaneraslan/advanced_rag.git
cd advanced_rag
  1. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies
pip install -r requirements.txt
  1. Configure environment variables
cp .env.example .env
# Edit .env with your API keys and configuration
  1. Create necessary directories
mkdir -p data logs vector_store

Docker Setup

# Build and run with Docker Compose
docker-compose up -d

# Check logs
docker-compose logs -f

# Stop
docker-compose down

πŸš€ Usage

Starting the Server

Local:

python app.py

Docker:

docker-compose up

The API will be available at http://localhost:8000

API Endpoints

1. Health Check

curl http://localhost:8000/health

2. Create a Session

curl -X POST http://localhost:8000/session/create

Response:

{
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "message": "Session created successfully"
}

3. Query the RAG System

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the main objective?",
    "session_id": "550e8400-e29b-41d4-a716-446655440000"
  }'

Response:

{
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "input_query": "What is the main objective?",
  "answer": "The main objective is...",
  "context": [...],
  "message_count": 2
}

4. Add Documents

curl -X POST http://localhost:8000/add-documents \
  -F "files=@document1.pdf" \
  -F "files=@document2.txt"

5. Clear Session

curl -X DELETE http://localhost:8000/session/{session_id}

6. Get Session Info

curl http://localhost:8000/session/{session_id}/info

7. List All Sessions

curl http://localhost:8000/sessions

Interactive API Documentation

Visit http://localhost:8000/docs for Swagger UI documentation.

πŸ§ͺ Testing

Run All Tests

pytest -v

Run with Coverage

pytest --cov=src --cov-report=html

Run Specific Test

pytest tests/test_integration.py::test_memory_system -v

Docker Integration Test

# Automated script (recommended)
./scripts/test_docker.sh

# Or use Makefile
make docker-test

# Manual Docker test
make docker-build
make docker-up
curl http://localhost:8000/health
make docker-down

Run Integration Test Script

python tests/test.py

βš™οΈ Configuration

All configuration is managed through environment variables. See .env.example for all available options.

Key Configuration Options

Variable Default Description
LLM_PROVIDER gemini LLM provider (openai/gemini/groq/ollama)
LLM_MODEL gemini-2.5-flash Model name
MAX_MEMORY_MESSAGES 10 Max messages per session
RETRIEVAL_TOP_K 5 Documents to retrieve
RERANKER_TOP_N 3 Documents after reranking
SPLITTING_METHOD semantic Text splitting method
LOG_LEVEL INFO Logging level

πŸ“Š MLOps & CI/CD

CI/CD Pipeline

The project includes a comprehensive GitHub Actions pipeline:

  1. Code Quality: Black, isort, flake8, pylint
  2. Security Scanning: Bandit, Safety
  3. Testing: pytest with coverage
  4. Docker Build: Multi-stage builds
  5. Integration Tests: Docker-based E2E tests
  6. Deployment: Automated deployment on main branch

Monitoring

  • Health Checks: /health endpoint with detailed status
  • Logging: Structured logging with rotation
  • Metrics: Request count, latency, error rates (via logs)

Local Development Workflow

# 1. Create feature branch
git checkout -b feature/new-feature

# 2. Make changes and test
pytest -v

# 3. Check code quality
black .
isort .
flake8 .

# 4. Commit and push
git add .
git commit -m "Add new feature"
git push origin feature/new-feature

# 5. Create PR (CI will run automatically)

πŸ—‚οΈ Project Structure

advanced_rag/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ chains.py              # LLM and RAG chain logic
β”‚   β”œβ”€β”€ data_processing.py     # Document loading and splitting
β”‚   β”œβ”€β”€ retrieval.py           # Retrieval and reranking
β”‚   └── memory.py              # Conversation memory system
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_integration.py    # Integration tests
β”‚   └── test.py                # Manual test script
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── CI.yml             # CI/CD pipeline
β”œβ”€β”€ app.py                     # FastAPI application
β”œβ”€β”€ config.py                  # Configuration management
β”œβ”€β”€ logging_config.py          # Logging setup
β”œβ”€β”€ main.py                    # CLI entry point
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ Dockerfile                 # Docker image
β”œβ”€β”€ docker-compose.yml         # Docker Compose config
β”œβ”€β”€ .env.example               # Environment template
β”œβ”€β”€ .gitignore                 # Git ignore rules
└── README.md                  # This file

πŸ”’ Security

  • API keys stored in environment variables
  • Rate limiting on all endpoints
  • Input validation with Pydantic
  • Security scanning in CI/CD pipeline
  • No sensitive data in logs

πŸ› Troubleshooting

Common Issues

1. Import errors

export PYTHONPATH=.

2. Memory database locked

rm conversation_memory.db

3. Vector store corruption

rm -rf vector_store/
# Restart server to rebuild

4. Docker health check failing

docker-compose logs rag-api
# Check for initialization errors

πŸ“š Resources

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests and code quality checks
  5. Submit a pull request

πŸ™ Acknowledgments

  • LangChain for the RAG framework
  • Hugging Face for embedding models
  • FastAPI for the web framework
  • The open-source community