The Multi-Intent AI Chatbot Assistant helps service and analytics teams answer both product-related and account-specific questions quickly, accurately, and securely.
The project evolves through three practical stages:
- 
Phase 1 - Pre-LLM (Deterministic Pilot)
Offline, rule-based chatbot that uses FAISS for document search and keyword-to-SQL mapping. - 
Phase 2 - Full LLM (Production)
Retrieval-augmented generation (RAG) platform with microservices, continuous feedback, and observability. - 
Phase 3 - Scaling and Orchestration (Kubernetes)
Expands Phase 2 into a self-healing, auto-scaling, cloud-native platform. 
Goal
Prove the concept with an explainable system that runs entirely offline.
Core Stack
- FastAPI backend
 - FAISS vector search with SentenceTransformers embeddings
 - Keyword-based SQL generation with validation guardrails
 - SQLite mock contract database
 - Docker for deployment and CI/CD
 
What It Does
- Classifies intent (knowledge, contract, or unknown).
 - Retrieves answers from local docs or SQL queries.
 - Applies guardrails for SQL safety, PII protection, and prompt injection defense.
 
Key Metrics
| Objective | Metric | Target | Owner | 
|---|---|---|---|
| Accuracy | Intent Classification | ≥ 80% | Data Science | 
| Speed | Response Latency | < 3 s | Engineering | 
| Security | SQL Validation | 100% Safe | Security | 
| Experience | Positive Feedback | ≥ 70% | CX Team | 
Outcome
A reliable, low-cost prototype that proves feasibility and governance readiness before introducing LLMs.
Goal
Scale the pilot into a production-grade platform that combines LLMs with retrieval and structured data.
Core Stack
- FastAPI microservices on Docker or managed containers (ECS)
 - GPT-4 Turbo integrated with FAISS (RAG pattern)
 - Natural-language-to-SQL via LLM
 - RLHF feedback and retraining loop
 - Prometheus, Grafana, and OpenTelemetry for monitoring
 - Helm for deployment templating (preparing for Kubernetes)
 - Role-based access control and guardrails
 
Helm Clarification
Helm is introduced in Phase 2 as a templating and deployment abstraction to prepare for Kubernetes.
It is fully adopted in Phase 3 as part of the orchestration stack.
What It Adds
- LLM-assisted intent classification in the Router Service
 - Contextual answers through RAG in Knowledge Service
 - LLM-generated SQL queries in Contract Service
 - Continuous learning via feedback loops
 
Key Metrics
| Objective | Metric | Target | Owner | 
|---|---|---|---|
| Reliability | Uptime | ≥ 99.9% | DevOps | 
| Performance | Latency (P95 including LLM) | < 2 s | Engineering | 
| Governance | Drift Detection | Automated | Data Ops | 
| Cost Efficiency | Average Cost per Query | < $0.05 | Finance | 
| Learning Cycle | Model Update Cadence | Weekly Retraining | Data Science | 
Outcome
An enterprise-ready AI assistant that combines structured data, documentation, and natural conversation with transparency and traceability.
Goal
Turn Phase 2 into a cloud-native, self-healing platform that scales automatically with demand.
Core Stack Enhancements
- Kubernetes (GKE, EKS, AKS) for orchestration
 - Helm for automated deployments
 - Horizontal Pod Autoscaler (HPA) for load scaling
 - Ingress and Load Balancer for global routing
 - GitOps (Argo CD or Flux) for continuous rollout
 - Unified observability with Prometheus and Grafana
 
What It Delivers
- Multi-node Kubernetes cluster with containerized services
 - Rolling updates and zero-downtime deployments
 - Centralized logs, metrics, and health monitoring
 - Elastic scaling for varying workloads
 
Key Metrics
| Objective | Metric | Target | Owner | 
|---|---|---|---|
| Scalability | Pod Expansion Under Load | < 1 min Reaction Time | DevOps | 
| Reliability | SLA Uptime | ≥ 99.95% | DevOps | 
| Efficiency | Node Utilization | ≥ 80% | Finance | 
| Deployment | Rollout Downtime | 0% (Zero-Downtime Guaranteed) | Platform Team | 
Outcome
A global, cloud-native chatbot platform that scales intelligently and recovers automatically — ready for enterprise traffic and future model integrations.
flowchart TD
    A[User Interface] --> B[Intent Classifier]
    B --> C{Router}
    C -->|Knowledge Query| D[Knowledge Agent - FAISS Vector DB]
    C -->|Contract Query| E[Contract Agent - Keyword to SQL Mapping]
    D --> F[Response Composer]
    E --> F
    F --> G[Chat Response]
    subgraph Guardrails
        H[PII Filter]
        I[Prompt Injection Detector]
        J[SQL Validator]
    end
    F --> H
    C --> I
    E --> J
    flowchart TD
    A[User or Agent UI] --> B[API Gateway]
    B --> C[Router Service - LLM Assisted Intent Classification]
    C -->|Knowledge Request| D[Knowledge Service - RAG with FAISS and LLM]
    C -->|Contract Request| E[Contract Service - LLM for SQL Generation]
    C -->|Feedback| F[Feedback Service - RLHF Loop]
    D --> G[Response Composer]
    E --> G
    F --> H[Feedback Store]
    G --> I[Analytics Dashboard]
    subgraph Observability
        K[Prometheus, Grafana, OpenTelemetry]
    end
    C --> K
    D --> K
    E --> K
    F --> K
    multi-intent-ai-chatbot-assistant/
├── phase1_pilot/
│   ├── app/
│   │   ├── main.py
│   │   ├── router.py
│   │   ├── intent_classifier.py
│   │   ├── chains.py
│   │   ├── contract_agent.py
│   │   └── utils.py
│   ├── guardrails/
│   │   ├── pii_filter.py
│   │   ├── sql_validator.py
│   │   └── prompt_injection_guard.py
│   ├── data/
│   │   ├── user_guide_sample.txt
│   │   └── mock_contracts.sql
│   ├── evals/
│   │   └── eval_results_phase1.md
│   ├── Dockerfile
│   └── ci_cd.yaml
│
├── phase2_production/
│   ├── services/
│   │   ├── router_service.py
│   │   ├── knowledge_service.py
│   │   ├── contract_service.py
│   │   ├── feedback_service.py
│   │   └── utils.py
│   ├── helm/
│   │   ├── deployment.yaml
│   │   └── secrets.yaml
│   ├── observability/
│   │   ├── prometheus_config.yml
│   │   └── grafana_dashboard.json
│   ├── evals/
│   │   └── eval_results_phase2.md
│   ├── .env.example
│   ├── Dockerfile
│   └── ci_cd_pipeline.yaml
│
└── phase3_scaling/
    ├── helm/
    │   ├── deployment.yaml
    │   └── values.yaml
    ├── observability/
    │   ├── prometheus_config.yml
    │   ├── grafana_dashboard.json
    │   └── alerts.yaml
    ├── gitops/
    │   └── argo_cd_pipeline.yaml
    ├── docs/
    │   └── phase3_scaling_overview.md
    └── README_phase3.md
Developed by James W. Niu
Questions: jameswnarch@gmail.com
MIT License