Multi-Tenant AI Chat Platform

🌐 Live Demo

Try it now: chatbot-ai-system.vercel.app

Demo instance uses limited API quotas. For full features, deploy your own instance following the Quick Start below.

Overview

Production-ready multi-tenant AI chatbot platform with intelligent LLM orchestration, WebSocket streaming, and reliable failover patterns. Built for performance and cost efficiency through semantic caching and provider redundancy.

Built as a reusable template - Easily customize for different use cases (customer support, code assistant, education, etc.) with pre-configured templates.

What This Project Demonstrates

This project showcases production-grade LLMOps and AI engineering skills:

Skill	Implementation	Location
Multi-Provider Orchestration	Unified interface for OpenAI, Anthropic, Llama, Gemini with intelligent routing	`src/chatbot_ai_system/orchestration/`
Semantic Caching	Redis-backed semantic similarity caching (~73% hit rate)	`src/chatbot_ai_system/cache/`
WebSocket Streaming	Real-time token streaming with ~186ms P95 latency	`src/chatbot_ai_system/websocket/`
Multi-Tenancy & Auth	Tenant isolation, JWT authentication, rate limiting	`src/chatbot_ai_system/middleware/`
Observability	Prometheus, Grafana, Jaeger distributed tracing	`monitoring/`
Infrastructure as Code	Kubernetes manifests, Docker Compose, CI/CD	`infrastructure/`, `k8s/`
Template Architecture	Reusable configurations for multiple use cases	`use-cases/`

View full architecture documentation →

Key Features

Multi-Provider Orchestration: Intelligent routing between OpenAI, Anthropic, Llama, and Gemini with automatic failover
WebSocket Streaming: Token-by-token streaming with ~186ms P95 latency (local benchmarks)
Cost Optimization: Semantic caching achieving ~73% hit rate and ~70% cost reduction
Production Patterns: Circuit breakers, rate limiting, health monitoring, and comprehensive observability
Multi-Tenancy Support: Complete tenant isolation with usage tracking and horizontal scaling
Template-Ready: Pre-configured use cases (customer support, code assistant) for rapid deployment

Verified Performance Metrics (Local Synthetic Benchmarks)

Metric	Target	Achieved	Evidence
P95 Latency	< 200ms	~186ms	`benchmark_summary.json`
P99 Latency	< 300ms	~245ms	`benchmark_summary.json`
Throughput	400+ RPS	~250 RPS	`benchmark_summary.json`
Cache Hit Rate	≥ 60%	~73%	`cache_metrics_latest.json`
Cost Reduction	≥ 30%	~70-73%	`cache_metrics_latest.json`
Provider Failover	< 500ms	~463ms	`benchmark_summary.json`
WebSocket Sessions	100+	~100	`benchmark_summary.json`

Note: Results are from local synthetic benchmarks on developer hardware, not production SLAs.

Run benchmarks yourself: python benchmarks/run_all_benchmarks.py

🚀 Quick Start

Docker Compose (Recommended)

The fastest way to get started:

# 1. Clone and configure
git clone https://github.com/cbratkovics/chatbot-ai-system.git
cd chatbot-ai-system
cp .env.example .env
# Add your API keys to .env

# 2. Start all services
docker compose up -d

# 3. Access the application
# Frontend:  http://localhost:3000
# API Docs:  http://localhost:8000/docs
# Health:    http://localhost:8000/health

Alternative: Local Development (Poetry + npm)

For active development with hot reload:

# Backend
poetry install
cp .env.example .env
# Add your API keys to .env
poetry run uvicorn chatbot_ai_system.server.main:app --reload

# Frontend (new terminal)
cd frontend
npm ci
cp .env.example .env.local
# Configure API URLs in .env.local
npm run dev

Access:

Template Mode: Use Case Quick Start

Deploy a pre-configured chatbot for specific use cases:

# Example: Customer Support Template
cp use-cases/customer-support/.env.example .env
cp use-cases/customer-support/system-prompt.txt src/chatbot_ai_system/config/

# Customize branding in .env
# Then start with docker compose up -d

Available Templates:

customer-support/ - Professional customer service assistant
More templates coming soon!

See use-cases/ for template documentation.

Architecture

flowchart TB
    subgraph "Client Layer"
        UI[Next.js UI]
        WS[WebSocket Client]
        REST[REST Client]
    end

    subgraph "API Gateway"
        LB[Load Balancer]
        ASGI[FastAPI Server]
    end

    subgraph "Core Services"
        MW[Middleware Stack]
        ORCH[Provider Orchestrator]
        CACHE[Semantic Cache]
    end

    subgraph "Providers"
        OAI[OpenAI API]
        ANTH[Anthropic API]
        LLAMA[Meta Llama]
        GEM[Google Gemini]
    end

    subgraph "Storage"
        REDIS[(Redis Cache)]
        PG[(PostgreSQL)]
    end

    subgraph "Observability"
        PROM[Prometheus]
        GRAF[Grafana]
        TRACE[Jaeger]
    end

    UI --> LB
    WS --> LB
    REST --> LB
    LB --> ASGI
    ASGI --> MW
    MW --> ORCH
    MW --> CACHE
    ORCH --> OAI
    ORCH --> ANTH
    ORCH --> LLAMA
    ORCH --> GEM
    CACHE --> REDIS
    MW --> PG
    ASGI --> PROM
    PROM --> GRAF
    ASGI --> TRACE

    style UI fill:#e1f5fe
    style ASGI fill:#c8e6c9
    style ORCH fill:#ffccbc
    style REDIS fill:#ffecb3
    style PROM fill:#f8bbd0

Configuration

Environment Variables

# Required API Keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Infrastructure
REDIS_URL=redis://localhost:6379/0
DATABASE_URL=postgresql://user:pass@localhost/chatbot

# Performance Tuning
RATE_LIMIT_REQUESTS=100
CACHE_TTL_SECONDS=3600
SEMANTIC_CACHE_THRESHOLD=0.85
REQUEST_TIMEOUT=30

# Feature Flags
ENABLE_STREAMING=true
ENABLE_FAILOVER=true
ENABLE_SEMANTIC_CACHE=true

# Frontend Configuration (in frontend/.env.local)
NEXT_PUBLIC_API_URL=http://localhost:8000
NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws
NEXT_PUBLIC_APP_NAME="AI Chat System"

Full configuration guide: docs/CONFIGURATION.md (if exists)

Production Deployment

This project is production-ready and can be deployed to Vercel + Render in under 30 minutes.

Quick Deploy to Vercel + Render (Recommended)

Infrastructure:

Vercel: Next.js frontend hosting (Free tier)
Render: FastAPI backend + Redis cache ($14/month)
Total Cost: $14/month + AI API usage

Steps:

Deploy Backend to Render:
- Connect your GitHub repository to Render
- Render auto-detects render.yaml configuration
- Set environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY)
- Deploy Redis instance ($7/month)
Deploy Frontend to Vercel:
```
cd frontend
vercel --prod
```
- Set environment variables in Vercel dashboard:
  - NEXT_PUBLIC_API_URL: Your Render backend URL
  - NEXT_PUBLIC_WS_URL: Your Render WebSocket URL
Update CORS:
- Add your Vercel domain to CORS_ORIGINS in Render dashboard

Documentation:

Full deployment guide: docs/PRODUCTION_DEPLOYMENT.md
Production checklist: docs/DEPLOYMENT_CHECKLIST.md

Production URLs (after deployment):

Frontend: https://your-app.vercel.app
Backend API: https://your-backend.onrender.com
API Docs: https://your-backend.onrender.com/docs

Alternative: Docker Deployment

# Build production image
docker build -f docker/dockerfiles/Dockerfile.production -t chatbot-ai-system:latest .

# Run with production compose
docker compose -f docker-compose.prod.yml up -d

Alternative: Kubernetes Deployment

# Apply Kubernetes configurations
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml

Kubernetes documentation: docs/kubernetes/README.md (if exists)

Scaling Considerations

Horizontal Scaling: Stateless design supports multiple replicas
Database: PostgreSQL with read replicas for high availability
Cache: Redis Cluster for distributed caching
Load Balancing: Nginx or cloud load balancer
Monitoring: Prometheus + Grafana dashboards included

Testing & Validation

# Run all quality checks
make lint          # Code linting with ruff
make type-check    # Type checking with mypy
make test          # Unit tests with pytest
make test-cov      # Tests with coverage report

# Individual test suites
poetry run pytest tests/unit -v           # Unit tests
poetry run pytest tests/integration -v    # Integration tests
poetry run pytest tests/e2e -v           # End-to-end tests

# Load testing
k6 run benchmarks/load_tests/k6_api_test.js
k6 run benchmarks/load_tests/k6_websocket_test.js

# Verify benchmark claims
python benchmarks/verify_metrics.py

CI/CD: All tests run automatically on pull requests via GitHub Actions

Monitoring & Observability

Metrics Collection

Prometheus: Application and system metrics
Grafana: Real-time dashboards and alerts
Jaeger: Distributed tracing for request flows

Key Metrics Tracked

Request latency (P50, P95, P99)
Provider availability and failover events
Cache hit rates and cost savings
Token usage and rate limiting
WebSocket connection metrics

Access monitoring:

Prometheus: http://localhost:9090
Grafana: http://localhost:3001
Jaeger: http://localhost:16686

Security Features

Authentication: JWT-based with refresh tokens
Rate Limiting: Token bucket algorithm per tenant
Input Validation: Pydantic models with strict validation
Secrets Management: Environment-based configuration
CORS Protection: Configurable origin restrictions
Content Filtering: Optional content moderation

Security documentation: docs/security/SECURITY.md

Technology Stack

Backend

Framework: FastAPI 0.104+ (async Python 3.12+)
LLM Providers: OpenAI, Anthropic, Meta Llama, Google Gemini
Caching: Redis with semantic similarity
Database: PostgreSQL with SQLAlchemy ORM
Message Queue: Redis Streams

Frontend

Framework: Next.js 14 (App Router)
Language: TypeScript
UI: Tailwind CSS + shadcn/ui components
State Management: React Context + Hooks
WebSocket: Native WebSocket API

Infrastructure

Containerization: Docker, Docker Compose
Orchestration: Kubernetes-ready
CI/CD: GitHub Actions
Monitoring: Prometheus, Grafana, Jaeger
Deployment: Vercel (frontend) + Render (backend)

Project Structure

├── src/chatbot_ai_system/    # Backend application
│   ├── server/               # FastAPI app and routes
│   ├── providers/            # LLM provider implementations
│   ├── orchestration/        # Routing and failover logic
│   ├── cache/                # Semantic caching system
│   ├── middleware/           # Auth, rate limiting, tracing
│   ├── websocket/            # WebSocket handlers
│   └── config/               # Configuration management
├── frontend/                 # Next.js frontend
│   ├── app/                  # Next.js 14 app directory
│   ├── components/           # React components
│   └── config/               # Frontend configuration
├── use-cases/                # Pre-configured templates
│   └── customer-support/     # Customer support template
├── benchmarks/               # Performance testing suite
│   ├── results/              # Benchmark results
│   └── load_tests/           # k6 load tests
├── tests/                    # Test suites
│   ├── unit/                 # Unit tests
│   ├── integration/          # Integration tests
│   └── e2e/                  # End-to-end tests
├── docs/                     # Documentation
│   ├── architecture/         # Architecture docs
│   ├── security/             # Security docs
│   └── deployment/           # Deployment guides
├── docker/                   # Docker configurations
│   ├── dockerfiles/          # Dockerfile variants
│   └── compose/              # Docker Compose files
├── k8s/                      # Kubernetes manifests
├── infrastructure/           # IaC and deployment configs
└── monitoring/               # Monitoring configurations

Contributing

We welcome contributions! Please read our Contributing Guide for details on our code of conduct, development process, and how to submit pull requests.

Key areas for contribution:

New AI provider integrations
Additional use-case templates
Performance optimizations
Documentation improvements
Bug fixes and feature requests

Community standards:

Acknowledgments

Built with excellent open-source tools:

FastAPI - Modern Python web framework
Next.js - React framework for production
Redis - In-memory data structure store
PostgreSQL - Robust relational database
Prometheus & Grafana - Monitoring stack
OpenAI & Anthropic for powerful LLM APIs

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Christopher J. Bratkovics

LinkedIn: linkedin.com/in/cbratkovics
Portfolio: cbratkovics.dev
GitHub: @cbratkovics

Project Stats

Lines of Code: ~15,000+
Test Coverage: 85%+
Docker Images: Backend, Frontend, Monitoring Stack
Supported Providers: OpenAI, Anthropic, Meta Llama, Google Gemini
Performance: <200ms P95 latency, 100+ concurrent WebSocket connections
Production-Ready: Deployed and tested in production environments

⭐ Star this repo if you find it useful!

Built with ❤️ for production AI systems

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
.github		.github
benchmarks		benchmarks
config		config
demo		demo
deploy		deploy
docker		docker
docs		docs
examples		examples
frontend		frontend
infrastructure		infrastructure
monitoring		monitoring
nginx		nginx
redis		redis
scripts		scripts
src		src
tests		tests
use-cases/customer-support		use-cases/customer-support
.commitlintrc.json		.commitlintrc.json
.dockerignore		.dockerignore
.dockerignore.test		.dockerignore.test
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
quick_start.sh		quick_start.sh
render.yaml		render.yaml
run_all_tests.sh		run_all_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Tenant AI Chat Platform

🌐 Live Demo

Overview

What This Project Demonstrates

Key Features

Verified Performance Metrics (Local Synthetic Benchmarks)

🚀 Quick Start

Docker Compose (Recommended)

Architecture

Configuration

Environment Variables

Production Deployment

Quick Deploy to Vercel + Render (Recommended)

Scaling Considerations

Testing & Validation

Monitoring & Observability

Metrics Collection

Key Metrics Tracked

Security Features

Technology Stack

Backend

Frontend

Infrastructure

Project Structure

Contributing

Acknowledgments

License

Contact

Project Stats

About

Uh oh!

Releases

Packages

Languages

License

cbratkovics/chatbot-ai-system

Folders and files

Latest commit

History

Repository files navigation

Multi-Tenant AI Chat Platform

🌐 Live Demo

Overview

What This Project Demonstrates

Key Features

Verified Performance Metrics (Local Synthetic Benchmarks)

🚀 Quick Start

Docker Compose (Recommended)

Architecture

Configuration

Environment Variables

Production Deployment

Quick Deploy to Vercel + Render (Recommended)

Scaling Considerations

Testing & Validation

Monitoring & Observability

Metrics Collection

Key Metrics Tracked

Security Features

Technology Stack

Backend

Frontend

Infrastructure

Project Structure

Contributing

Acknowledgments

License

Contact

Project Stats

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages