Skip to content

Production-grade, multi-tenant chat service with FastAPI + WebSockets, OpenAI/Anthropic orchestration, semantic caching, and K8s/IaC deployment, Includes observability (Prometheus/Grafana/Jaeger), FinOps cost tracking, and DR runbooks.

License

Notifications You must be signed in to change notification settings

cbratkovics/chatbot-ai-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Multi-Tenant AI Chat Platform

License: MIT Python 3.12+ Node 20+ FastAPI Code style: ruff CI Pipeline codecov PRs Welcome

🌐 Live Demo

Try it now: chatbot-ai-system.vercel.app

Demo instance uses limited API quotas. For full features, deploy your own instance following the Quick Start below.


Overview

Production-ready multi-tenant AI chatbot platform with intelligent LLM orchestration, WebSocket streaming, and reliable failover patterns. Built for performance and cost efficiency through semantic caching and provider redundancy.

Built as a reusable template - Easily customize for different use cases (customer support, code assistant, education, etc.) with pre-configured templates.


What This Project Demonstrates

This project showcases production-grade LLMOps and AI engineering skills:

Skill Implementation Location
Multi-Provider Orchestration Unified interface for OpenAI, Anthropic, Llama, Gemini with intelligent routing src/chatbot_ai_system/orchestration/
Semantic Caching Redis-backed semantic similarity caching (~73% hit rate) src/chatbot_ai_system/cache/
WebSocket Streaming Real-time token streaming with ~186ms P95 latency src/chatbot_ai_system/websocket/
Multi-Tenancy & Auth Tenant isolation, JWT authentication, rate limiting src/chatbot_ai_system/middleware/
Observability Prometheus, Grafana, Jaeger distributed tracing monitoring/
Infrastructure as Code Kubernetes manifests, Docker Compose, CI/CD infrastructure/, k8s/
Template Architecture Reusable configurations for multiple use cases use-cases/

View full architecture documentation →


Key Features

  • Multi-Provider Orchestration: Intelligent routing between OpenAI, Anthropic, Llama, and Gemini with automatic failover
  • WebSocket Streaming: Token-by-token streaming with ~186ms P95 latency (local benchmarks)
  • Cost Optimization: Semantic caching achieving ~73% hit rate and ~70% cost reduction
  • Production Patterns: Circuit breakers, rate limiting, health monitoring, and comprehensive observability
  • Multi-Tenancy Support: Complete tenant isolation with usage tracking and horizontal scaling
  • Template-Ready: Pre-configured use cases (customer support, code assistant) for rapid deployment

Verified Performance Metrics (Local Synthetic Benchmarks)

Metric Target Achieved Evidence
P95 Latency < 200ms ~186ms benchmark_summary.json
P99 Latency < 300ms ~245ms benchmark_summary.json
Throughput 400+ RPS ~250 RPS benchmark_summary.json
Cache Hit Rate ≥ 60% ~73% cache_metrics_latest.json
Cost Reduction ≥ 30% ~70-73% cache_metrics_latest.json
Provider Failover < 500ms ~463ms benchmark_summary.json
WebSocket Sessions 100+ ~100 benchmark_summary.json

Note: Results are from local synthetic benchmarks on developer hardware, not production SLAs.

Run benchmarks yourself: python benchmarks/run_all_benchmarks.py


🚀 Quick Start

Docker Compose (Recommended)

The fastest way to get started:

# 1. Clone and configure
git clone https://github.com/cbratkovics/chatbot-ai-system.git
cd chatbot-ai-system
cp .env.example .env
# Add your API keys to .env

# 2. Start all services
docker compose up -d

# 3. Access the application
# Frontend:  http://localhost:3000
# API Docs:  http://localhost:8000/docs
# Health:    http://localhost:8000/health
Alternative: Local Development (Poetry + npm)

For active development with hot reload:

# Backend
poetry install
cp .env.example .env
# Add your API keys to .env
poetry run uvicorn chatbot_ai_system.server.main:app --reload

# Frontend (new terminal)
cd frontend
npm ci
cp .env.example .env.local
# Configure API URLs in .env.local
npm run dev

Access:

Template Mode: Use Case Quick Start

Deploy a pre-configured chatbot for specific use cases:

# Example: Customer Support Template
cp use-cases/customer-support/.env.example .env
cp use-cases/customer-support/system-prompt.txt src/chatbot_ai_system/config/

# Customize branding in .env
# Then start with docker compose up -d

Available Templates:

  • customer-support/ - Professional customer service assistant
  • More templates coming soon!

See use-cases/ for template documentation.


Architecture

flowchart TB
    subgraph "Client Layer"
        UI[Next.js UI]
        WS[WebSocket Client]
        REST[REST Client]
    end

    subgraph "API Gateway"
        LB[Load Balancer]
        ASGI[FastAPI Server]
    end

    subgraph "Core Services"
        MW[Middleware Stack]
        ORCH[Provider Orchestrator]
        CACHE[Semantic Cache]
    end

    subgraph "Providers"
        OAI[OpenAI API]
        ANTH[Anthropic API]
        LLAMA[Meta Llama]
        GEM[Google Gemini]
    end

    subgraph "Storage"
        REDIS[(Redis Cache)]
        PG[(PostgreSQL)]
    end

    subgraph "Observability"
        PROM[Prometheus]
        GRAF[Grafana]
        TRACE[Jaeger]
    end

    UI --> LB
    WS --> LB
    REST --> LB
    LB --> ASGI
    ASGI --> MW
    MW --> ORCH
    MW --> CACHE
    ORCH --> OAI
    ORCH --> ANTH
    ORCH --> LLAMA
    ORCH --> GEM
    CACHE --> REDIS
    MW --> PG
    ASGI --> PROM
    PROM --> GRAF
    ASGI --> TRACE

    style UI fill:#e1f5fe
    style ASGI fill:#c8e6c9
    style ORCH fill:#ffccbc
    style REDIS fill:#ffecb3
    style PROM fill:#f8bbd0
Loading

Configuration

Environment Variables

# Required API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Infrastructure
REDIS_URL=redis://localhost:6379/0
DATABASE_URL=postgresql://user:pass@localhost/chatbot

# Performance Tuning
RATE_LIMIT_REQUESTS=100
CACHE_TTL_SECONDS=3600
SEMANTIC_CACHE_THRESHOLD=0.85
REQUEST_TIMEOUT=30

# Feature Flags
ENABLE_STREAMING=true
ENABLE_FAILOVER=true
ENABLE_SEMANTIC_CACHE=true

# Frontend Configuration (in frontend/.env.local)
NEXT_PUBLIC_API_URL=http://localhost:8000
NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws
NEXT_PUBLIC_APP_NAME="AI Chat System"

Full configuration guide: docs/CONFIGURATION.md (if exists)


Production Deployment

This project is production-ready and can be deployed to Vercel + Render in under 30 minutes.

Quick Deploy to Vercel + Render (Recommended)

Infrastructure:

  • Vercel: Next.js frontend hosting (Free tier)
  • Render: FastAPI backend + Redis cache ($14/month)
  • Total Cost: $14/month + AI API usage

Steps:

  1. Deploy Backend to Render:

    • Connect your GitHub repository to Render
    • Render auto-detects render.yaml configuration
    • Set environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY)
    • Deploy Redis instance ($7/month)
  2. Deploy Frontend to Vercel:

    cd frontend
    vercel --prod
    • Set environment variables in Vercel dashboard:
      • NEXT_PUBLIC_API_URL: Your Render backend URL
      • NEXT_PUBLIC_WS_URL: Your Render WebSocket URL
  3. Update CORS:

    • Add your Vercel domain to CORS_ORIGINS in Render dashboard

Documentation:

Production URLs (after deployment):

  • Frontend: https://your-app.vercel.app
  • Backend API: https://your-backend.onrender.com
  • API Docs: https://your-backend.onrender.com/docs
Alternative: Docker Deployment
# Build production image
docker build -f docker/dockerfiles/Dockerfile.production -t chatbot-ai-system:latest .

# Run with production compose
docker compose -f docker-compose.prod.yml up -d
Alternative: Kubernetes Deployment
# Apply Kubernetes configurations
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml

Kubernetes documentation: docs/kubernetes/README.md (if exists)

Scaling Considerations

  • Horizontal Scaling: Stateless design supports multiple replicas
  • Database: PostgreSQL with read replicas for high availability
  • Cache: Redis Cluster for distributed caching
  • Load Balancing: Nginx or cloud load balancer
  • Monitoring: Prometheus + Grafana dashboards included

Testing & Validation

# Run all quality checks
make lint          # Code linting with ruff
make type-check    # Type checking with mypy
make test          # Unit tests with pytest
make test-cov      # Tests with coverage report

# Individual test suites
poetry run pytest tests/unit -v           # Unit tests
poetry run pytest tests/integration -v    # Integration tests
poetry run pytest tests/e2e -v           # End-to-end tests

# Load testing
k6 run benchmarks/load_tests/k6_api_test.js
k6 run benchmarks/load_tests/k6_websocket_test.js

# Verify benchmark claims
python benchmarks/verify_metrics.py

CI/CD: All tests run automatically on pull requests via GitHub Actions


Monitoring & Observability

Metrics Collection

  • Prometheus: Application and system metrics
  • Grafana: Real-time dashboards and alerts
  • Jaeger: Distributed tracing for request flows

Key Metrics Tracked

  • Request latency (P50, P95, P99)
  • Provider availability and failover events
  • Cache hit rates and cost savings
  • Token usage and rate limiting
  • WebSocket connection metrics

Access monitoring:

  • Prometheus: http://localhost:9090
  • Grafana: http://localhost:3001
  • Jaeger: http://localhost:16686

Security Features

  • Authentication: JWT-based with refresh tokens
  • Rate Limiting: Token bucket algorithm per tenant
  • Input Validation: Pydantic models with strict validation
  • Secrets Management: Environment-based configuration
  • CORS Protection: Configurable origin restrictions
  • Content Filtering: Optional content moderation

Security documentation: docs/security/SECURITY.md


Technology Stack

Backend

  • Framework: FastAPI 0.104+ (async Python 3.12+)
  • LLM Providers: OpenAI, Anthropic, Meta Llama, Google Gemini
  • Caching: Redis with semantic similarity
  • Database: PostgreSQL with SQLAlchemy ORM
  • Message Queue: Redis Streams

Frontend

  • Framework: Next.js 14 (App Router)
  • Language: TypeScript
  • UI: Tailwind CSS + shadcn/ui components
  • State Management: React Context + Hooks
  • WebSocket: Native WebSocket API

Infrastructure

  • Containerization: Docker, Docker Compose
  • Orchestration: Kubernetes-ready
  • CI/CD: GitHub Actions
  • Monitoring: Prometheus, Grafana, Jaeger
  • Deployment: Vercel (frontend) + Render (backend)

Project Structure

├── src/chatbot_ai_system/    # Backend application
│   ├── server/               # FastAPI app and routes
│   ├── providers/            # LLM provider implementations
│   ├── orchestration/        # Routing and failover logic
│   ├── cache/                # Semantic caching system
│   ├── middleware/           # Auth, rate limiting, tracing
│   ├── websocket/            # WebSocket handlers
│   └── config/               # Configuration management
├── frontend/                 # Next.js frontend
│   ├── app/                  # Next.js 14 app directory
│   ├── components/           # React components
│   └── config/               # Frontend configuration
├── use-cases/                # Pre-configured templates
│   └── customer-support/     # Customer support template
├── benchmarks/               # Performance testing suite
│   ├── results/              # Benchmark results
│   └── load_tests/           # k6 load tests
├── tests/                    # Test suites
│   ├── unit/                 # Unit tests
│   ├── integration/          # Integration tests
│   └── e2e/                  # End-to-end tests
├── docs/                     # Documentation
│   ├── architecture/         # Architecture docs
│   ├── security/             # Security docs
│   └── deployment/           # Deployment guides
├── docker/                   # Docker configurations
│   ├── dockerfiles/          # Dockerfile variants
│   └── compose/              # Docker Compose files
├── k8s/                      # Kubernetes manifests
├── infrastructure/           # IaC and deployment configs
└── monitoring/               # Monitoring configurations

Contributing

We welcome contributions! Please read our Contributing Guide for details on our code of conduct, development process, and how to submit pull requests.

Key areas for contribution:

  • New AI provider integrations
  • Additional use-case templates
  • Performance optimizations
  • Documentation improvements
  • Bug fixes and feature requests

Community standards:


Acknowledgments

Built with excellent open-source tools:

  • FastAPI - Modern Python web framework
  • Next.js - React framework for production
  • Redis - In-memory data structure store
  • PostgreSQL - Robust relational database
  • Prometheus & Grafana - Monitoring stack
  • OpenAI & Anthropic for powerful LLM APIs

License

This project is licensed under the MIT License - see the LICENSE file for details.


Contact

Christopher J. Bratkovics


Project Stats

  • Lines of Code: ~15,000+
  • Test Coverage: 85%+
  • Docker Images: Backend, Frontend, Monitoring Stack
  • Supported Providers: OpenAI, Anthropic, Meta Llama, Google Gemini
  • Performance: <200ms P95 latency, 100+ concurrent WebSocket connections
  • Production-Ready: Deployed and tested in production environments

Star this repo if you find it useful!

Built with ❤️ for production AI systems

About

Production-grade, multi-tenant chat service with FastAPI + WebSockets, OpenAI/Anthropic orchestration, semantic caching, and K8s/IaC deployment, Includes observability (Prometheus/Grafana/Jaeger), FinOps cost tracking, and DR runbooks.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published