Try it now: chatbot-ai-system.vercel.app
Demo instance uses limited API quotas. For full features, deploy your own instance following the Quick Start below.
Production-ready multi-tenant AI chatbot platform with intelligent LLM orchestration, WebSocket streaming, and reliable failover patterns. Built for performance and cost efficiency through semantic caching and provider redundancy.
Built as a reusable template - Easily customize for different use cases (customer support, code assistant, education, etc.) with pre-configured templates.
This project showcases production-grade LLMOps and AI engineering skills:
| Skill | Implementation | Location |
|---|---|---|
| Multi-Provider Orchestration | Unified interface for OpenAI, Anthropic, Llama, Gemini with intelligent routing | src/chatbot_ai_system/orchestration/ |
| Semantic Caching | Redis-backed semantic similarity caching (~73% hit rate) | src/chatbot_ai_system/cache/ |
| WebSocket Streaming | Real-time token streaming with ~186ms P95 latency | src/chatbot_ai_system/websocket/ |
| Multi-Tenancy & Auth | Tenant isolation, JWT authentication, rate limiting | src/chatbot_ai_system/middleware/ |
| Observability | Prometheus, Grafana, Jaeger distributed tracing | monitoring/ |
| Infrastructure as Code | Kubernetes manifests, Docker Compose, CI/CD | infrastructure/, k8s/ |
| Template Architecture | Reusable configurations for multiple use cases | use-cases/ |
View full architecture documentation →
- Multi-Provider Orchestration: Intelligent routing between OpenAI, Anthropic, Llama, and Gemini with automatic failover
- WebSocket Streaming: Token-by-token streaming with ~186ms P95 latency (local benchmarks)
- Cost Optimization: Semantic caching achieving ~73% hit rate and ~70% cost reduction
- Production Patterns: Circuit breakers, rate limiting, health monitoring, and comprehensive observability
- Multi-Tenancy Support: Complete tenant isolation with usage tracking and horizontal scaling
- Template-Ready: Pre-configured use cases (customer support, code assistant) for rapid deployment
| Metric | Target | Achieved | Evidence |
|---|---|---|---|
| P95 Latency | < 200ms | ~186ms | benchmark_summary.json |
| P99 Latency | < 300ms | ~245ms | benchmark_summary.json |
| Throughput | 400+ RPS | ~250 RPS | benchmark_summary.json |
| Cache Hit Rate | ≥ 60% | ~73% | cache_metrics_latest.json |
| Cost Reduction | ≥ 30% | ~70-73% | cache_metrics_latest.json |
| Provider Failover | < 500ms | ~463ms | benchmark_summary.json |
| WebSocket Sessions | 100+ | ~100 | benchmark_summary.json |
Note: Results are from local synthetic benchmarks on developer hardware, not production SLAs.
Run benchmarks yourself: python benchmarks/run_all_benchmarks.py
The fastest way to get started:
# 1. Clone and configure
git clone https://github.com/cbratkovics/chatbot-ai-system.git
cd chatbot-ai-system
cp .env.example .env
# Add your API keys to .env
# 2. Start all services
docker compose up -d
# 3. Access the application
# Frontend: http://localhost:3000
# API Docs: http://localhost:8000/docs
# Health: http://localhost:8000/healthAlternative: Local Development (Poetry + npm)
For active development with hot reload:
# Backend
poetry install
cp .env.example .env
# Add your API keys to .env
poetry run uvicorn chatbot_ai_system.server.main:app --reload
# Frontend (new terminal)
cd frontend
npm ci
cp .env.example .env.local
# Configure API URLs in .env.local
npm run devAccess:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
Template Mode: Use Case Quick Start
Deploy a pre-configured chatbot for specific use cases:
# Example: Customer Support Template
cp use-cases/customer-support/.env.example .env
cp use-cases/customer-support/system-prompt.txt src/chatbot_ai_system/config/
# Customize branding in .env
# Then start with docker compose up -dAvailable Templates:
customer-support/- Professional customer service assistant- More templates coming soon!
See use-cases/ for template documentation.
flowchart TB
subgraph "Client Layer"
UI[Next.js UI]
WS[WebSocket Client]
REST[REST Client]
end
subgraph "API Gateway"
LB[Load Balancer]
ASGI[FastAPI Server]
end
subgraph "Core Services"
MW[Middleware Stack]
ORCH[Provider Orchestrator]
CACHE[Semantic Cache]
end
subgraph "Providers"
OAI[OpenAI API]
ANTH[Anthropic API]
LLAMA[Meta Llama]
GEM[Google Gemini]
end
subgraph "Storage"
REDIS[(Redis Cache)]
PG[(PostgreSQL)]
end
subgraph "Observability"
PROM[Prometheus]
GRAF[Grafana]
TRACE[Jaeger]
end
UI --> LB
WS --> LB
REST --> LB
LB --> ASGI
ASGI --> MW
MW --> ORCH
MW --> CACHE
ORCH --> OAI
ORCH --> ANTH
ORCH --> LLAMA
ORCH --> GEM
CACHE --> REDIS
MW --> PG
ASGI --> PROM
PROM --> GRAF
ASGI --> TRACE
style UI fill:#e1f5fe
style ASGI fill:#c8e6c9
style ORCH fill:#ffccbc
style REDIS fill:#ffecb3
style PROM fill:#f8bbd0
# Required API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Infrastructure
REDIS_URL=redis://localhost:6379/0
DATABASE_URL=postgresql://user:pass@localhost/chatbot
# Performance Tuning
RATE_LIMIT_REQUESTS=100
CACHE_TTL_SECONDS=3600
SEMANTIC_CACHE_THRESHOLD=0.85
REQUEST_TIMEOUT=30
# Feature Flags
ENABLE_STREAMING=true
ENABLE_FAILOVER=true
ENABLE_SEMANTIC_CACHE=true
# Frontend Configuration (in frontend/.env.local)
NEXT_PUBLIC_API_URL=http://localhost:8000
NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws
NEXT_PUBLIC_APP_NAME="AI Chat System"Full configuration guide: docs/CONFIGURATION.md (if exists)
This project is production-ready and can be deployed to Vercel + Render in under 30 minutes.
Infrastructure:
- Vercel: Next.js frontend hosting (Free tier)
- Render: FastAPI backend + Redis cache ($14/month)
- Total Cost: $14/month + AI API usage
Steps:
-
Deploy Backend to Render:
- Connect your GitHub repository to Render
- Render auto-detects
render.yamlconfiguration - Set environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY)
- Deploy Redis instance ($7/month)
-
Deploy Frontend to Vercel:
cd frontend vercel --prod- Set environment variables in Vercel dashboard:
NEXT_PUBLIC_API_URL: Your Render backend URLNEXT_PUBLIC_WS_URL: Your Render WebSocket URL
- Set environment variables in Vercel dashboard:
-
Update CORS:
- Add your Vercel domain to
CORS_ORIGINSin Render dashboard
- Add your Vercel domain to
Documentation:
- Full deployment guide:
docs/PRODUCTION_DEPLOYMENT.md - Production checklist:
docs/DEPLOYMENT_CHECKLIST.md
Production URLs (after deployment):
- Frontend:
https://your-app.vercel.app - Backend API:
https://your-backend.onrender.com - API Docs:
https://your-backend.onrender.com/docs
Alternative: Docker Deployment
# Build production image
docker build -f docker/dockerfiles/Dockerfile.production -t chatbot-ai-system:latest .
# Run with production compose
docker compose -f docker-compose.prod.yml up -dAlternative: Kubernetes Deployment
# Apply Kubernetes configurations
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yamlKubernetes documentation: docs/kubernetes/README.md (if exists)
- Horizontal Scaling: Stateless design supports multiple replicas
- Database: PostgreSQL with read replicas for high availability
- Cache: Redis Cluster for distributed caching
- Load Balancing: Nginx or cloud load balancer
- Monitoring: Prometheus + Grafana dashboards included
# Run all quality checks
make lint # Code linting with ruff
make type-check # Type checking with mypy
make test # Unit tests with pytest
make test-cov # Tests with coverage report
# Individual test suites
poetry run pytest tests/unit -v # Unit tests
poetry run pytest tests/integration -v # Integration tests
poetry run pytest tests/e2e -v # End-to-end tests
# Load testing
k6 run benchmarks/load_tests/k6_api_test.js
k6 run benchmarks/load_tests/k6_websocket_test.js
# Verify benchmark claims
python benchmarks/verify_metrics.pyCI/CD: All tests run automatically on pull requests via GitHub Actions
- Prometheus: Application and system metrics
- Grafana: Real-time dashboards and alerts
- Jaeger: Distributed tracing for request flows
- Request latency (P50, P95, P99)
- Provider availability and failover events
- Cache hit rates and cost savings
- Token usage and rate limiting
- WebSocket connection metrics
Access monitoring:
- Prometheus:
http://localhost:9090 - Grafana:
http://localhost:3001 - Jaeger:
http://localhost:16686
- Authentication: JWT-based with refresh tokens
- Rate Limiting: Token bucket algorithm per tenant
- Input Validation: Pydantic models with strict validation
- Secrets Management: Environment-based configuration
- CORS Protection: Configurable origin restrictions
- Content Filtering: Optional content moderation
Security documentation: docs/security/SECURITY.md
- Framework: FastAPI 0.104+ (async Python 3.12+)
- LLM Providers: OpenAI, Anthropic, Meta Llama, Google Gemini
- Caching: Redis with semantic similarity
- Database: PostgreSQL with SQLAlchemy ORM
- Message Queue: Redis Streams
- Framework: Next.js 14 (App Router)
- Language: TypeScript
- UI: Tailwind CSS + shadcn/ui components
- State Management: React Context + Hooks
- WebSocket: Native WebSocket API
- Containerization: Docker, Docker Compose
- Orchestration: Kubernetes-ready
- CI/CD: GitHub Actions
- Monitoring: Prometheus, Grafana, Jaeger
- Deployment: Vercel (frontend) + Render (backend)
├── src/chatbot_ai_system/ # Backend application
│ ├── server/ # FastAPI app and routes
│ ├── providers/ # LLM provider implementations
│ ├── orchestration/ # Routing and failover logic
│ ├── cache/ # Semantic caching system
│ ├── middleware/ # Auth, rate limiting, tracing
│ ├── websocket/ # WebSocket handlers
│ └── config/ # Configuration management
├── frontend/ # Next.js frontend
│ ├── app/ # Next.js 14 app directory
│ ├── components/ # React components
│ └── config/ # Frontend configuration
├── use-cases/ # Pre-configured templates
│ └── customer-support/ # Customer support template
├── benchmarks/ # Performance testing suite
│ ├── results/ # Benchmark results
│ └── load_tests/ # k6 load tests
├── tests/ # Test suites
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── e2e/ # End-to-end tests
├── docs/ # Documentation
│ ├── architecture/ # Architecture docs
│ ├── security/ # Security docs
│ └── deployment/ # Deployment guides
├── docker/ # Docker configurations
│ ├── dockerfiles/ # Dockerfile variants
│ └── compose/ # Docker Compose files
├── k8s/ # Kubernetes manifests
├── infrastructure/ # IaC and deployment configs
└── monitoring/ # Monitoring configurations
We welcome contributions! Please read our Contributing Guide for details on our code of conduct, development process, and how to submit pull requests.
Key areas for contribution:
- New AI provider integrations
- Additional use-case templates
- Performance optimizations
- Documentation improvements
- Bug fixes and feature requests
Community standards:
Built with excellent open-source tools:
- FastAPI - Modern Python web framework
- Next.js - React framework for production
- Redis - In-memory data structure store
- PostgreSQL - Robust relational database
- Prometheus & Grafana - Monitoring stack
- OpenAI & Anthropic for powerful LLM APIs
This project is licensed under the MIT License - see the LICENSE file for details.
Christopher J. Bratkovics
- LinkedIn: linkedin.com/in/cbratkovics
- Portfolio: cbratkovics.dev
- GitHub: @cbratkovics
- Lines of Code: ~15,000+
- Test Coverage: 85%+
- Docker Images: Backend, Frontend, Monitoring Stack
- Supported Providers: OpenAI, Anthropic, Meta Llama, Google Gemini
- Performance: <200ms P95 latency, 100+ concurrent WebSocket connections
- Production-Ready: Deployed and tested in production environments
⭐ Star this repo if you find it useful!
Built with ❤️ for production AI systems