A production-ready Retrieval-Augmented Generation (RAG) system with conversation memory, built with LangChain, FastAPI, and modern MLOps practices.
- Multi-Provider LLM Support: OpenAI, Google Gemini, Groq, and Ollama
- Advanced Retrieval: Hybrid search (vector + BM25) with reranking
- Conversation Memory: SQLite-based persistent conversation history
- Document Processing: Support for TXT and PDF files
- Semantic & Recursive Chunking: Flexible text splitting strategies
- RESTful API: FastAPI with automatic OpenAPI documentation
- β Comprehensive Logging: Structured logging with rotation
- β Error Handling: Graceful error handling and recovery
- β Health Checks: Kubernetes-ready health endpoints
- β Rate Limiting: API rate limiting to prevent abuse
- β CORS Support: Configurable CORS for web applications
- β Docker Support: Multi-stage builds with health checks
- β CI/CD Pipeline: Automated testing, security scanning, and deployment
- β Configuration Management: Environment-based configuration
βββββββββββββββββββ
β User Query β
ββββββββββ¬βββββββββ
β
v
βββββββββββββββββββ
β FastAPI Server β
β - Rate Limit β
β - CORS β
β - Logging β
ββββββββββ¬βββββββββ
β
v
βββββββββββββββββββββββββββββββ
β Conversation Memory β
β (SQLite) β
β - Session Management β
β - History Retrieval β
βββββββββββββββ¬ββββββββββββββββ
β
v
βββββββββββββββββββββββββββββββ
β Document Retrieval β
β ββββββββββββββββββββββββ β
β β Vector Search (Chroma)β β
β β + BM25 (Keyword) β β
β ββββββββββββββββββββββββ β
β ββββββββββββββββββββββββ β
β β Ensemble + Reranking β β
β ββββββββββββββββββββββββ β
βββββββββββββββ¬ββββββββββββββββ
β
v
βββββββββββββββββββββββββββββββ
β LLM (Gemini/OpenAI/etc) β
β - Context + History β
β - Answer Generation β
βββββββββββββββ¬ββββββββββββββββ
β
v
βββββββββββββββββββββββββββββββ
β Response + Memory Update β
βββββββββββββββββββββββββββββββ
- Python 3.11+
- Docker & Docker Compose (optional)
- API keys for your chosen LLM provider
- Clone the repository
git clone https://github.com/gokhaneraslan/advanced_rag.git
cd advanced_rag- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Configure environment variables
cp .env.example .env
# Edit .env with your API keys and configuration- Create necessary directories
mkdir -p data logs vector_store# Build and run with Docker Compose
docker-compose up -d
# Check logs
docker-compose logs -f
# Stop
docker-compose downLocal:
python app.pyDocker:
docker-compose upThe API will be available at http://localhost:8000
curl http://localhost:8000/healthcurl -X POST http://localhost:8000/session/createResponse:
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"message": "Session created successfully"
}curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is the main objective?",
"session_id": "550e8400-e29b-41d4-a716-446655440000"
}'Response:
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"input_query": "What is the main objective?",
"answer": "The main objective is...",
"context": [...],
"message_count": 2
}curl -X POST http://localhost:8000/add-documents \
-F "files=@document1.pdf" \
-F "files=@document2.txt"curl -X DELETE http://localhost:8000/session/{session_id}curl http://localhost:8000/session/{session_id}/infocurl http://localhost:8000/sessionsVisit http://localhost:8000/docs for Swagger UI documentation.
pytest -vpytest --cov=src --cov-report=htmlpytest tests/test_integration.py::test_memory_system -v# Automated script (recommended)
./scripts/test_docker.sh
# Or use Makefile
make docker-test
# Manual Docker test
make docker-build
make docker-up
curl http://localhost:8000/health
make docker-downpython tests/test.pyAll configuration is managed through environment variables. See .env.example for all available options.
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
gemini |
LLM provider (openai/gemini/groq/ollama) |
LLM_MODEL |
gemini-2.5-flash |
Model name |
MAX_MEMORY_MESSAGES |
10 |
Max messages per session |
RETRIEVAL_TOP_K |
5 |
Documents to retrieve |
RERANKER_TOP_N |
3 |
Documents after reranking |
SPLITTING_METHOD |
semantic |
Text splitting method |
LOG_LEVEL |
INFO |
Logging level |
The project includes a comprehensive GitHub Actions pipeline:
- Code Quality: Black, isort, flake8, pylint
- Security Scanning: Bandit, Safety
- Testing: pytest with coverage
- Docker Build: Multi-stage builds
- Integration Tests: Docker-based E2E tests
- Deployment: Automated deployment on main branch
- Health Checks:
/healthendpoint with detailed status - Logging: Structured logging with rotation
- Metrics: Request count, latency, error rates (via logs)
# 1. Create feature branch
git checkout -b feature/new-feature
# 2. Make changes and test
pytest -v
# 3. Check code quality
black .
isort .
flake8 .
# 4. Commit and push
git add .
git commit -m "Add new feature"
git push origin feature/new-feature
# 5. Create PR (CI will run automatically)advanced_rag/
βββ src/
β βββ chains.py # LLM and RAG chain logic
β βββ data_processing.py # Document loading and splitting
β βββ retrieval.py # Retrieval and reranking
β βββ memory.py # Conversation memory system
βββ tests/
β βββ test_integration.py # Integration tests
β βββ test.py # Manual test script
βββ .github/
β βββ workflows/
β βββ CI.yml # CI/CD pipeline
βββ app.py # FastAPI application
βββ config.py # Configuration management
βββ logging_config.py # Logging setup
βββ main.py # CLI entry point
βββ requirements.txt # Python dependencies
βββ Dockerfile # Docker image
βββ docker-compose.yml # Docker Compose config
βββ .env.example # Environment template
βββ .gitignore # Git ignore rules
βββ README.md # This file
- API keys stored in environment variables
- Rate limiting on all endpoints
- Input validation with Pydantic
- Security scanning in CI/CD pipeline
- No sensitive data in logs
1. Import errors
export PYTHONPATH=.2. Memory database locked
rm conversation_memory.db3. Vector store corruption
rm -rf vector_store/
# Restart server to rebuild4. Docker health check failing
docker-compose logs rag-api
# Check for initialization errors- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and code quality checks
- Submit a pull request
- LangChain for the RAG framework
- Hugging Face for embedding models
- FastAPI for the web framework
- The open-source community