Skip to content

FaultMaven/faultmaven-deploy

Repository files navigation

FaultMaven - Self-Hosted Deployment

An AI-powered troubleshooting copilot you can run anywhere for free.

License Docker


Overview

This repository provides a complete Docker Compose deployment for self-hosting FaultMaven, an AI-powered troubleshooting copilot that helps you diagnose and resolve technical issues faster.

📖 For architectural details and contributing: See the main FaultMaven repository.

What you get with self-hosted deployment:

  • 🤖 AI Troubleshooting Agent - LangGraph-powered assistant with milestone-based investigation
  • 📚 3-Tier Knowledge Base - Personal KB + Global KB + Case Working Memory
  • 📊 8 Data Type Support - Logs, traces, profiles, metrics, config, code, text, visual
  • 🗄️ Portable SQLite Database - Zero configuration, single file, easy backups
  • 🔍 Vector Search - ChromaDB for semantic knowledge retrieval
  • ⚙️ Background Processing - Celery + Redis for async operations

Deploy everything in 2 minutes with a single command.


Who Is This For?

✅ Perfect For:

  • 👨‍💻 Developers - Study architecture, contribute code, learn AI troubleshooting
  • 🔬 Tinkerers - Experiment with LLMs, RAG, and agentic workflows
  • 🔐 Privacy-conscious - Keep sensitive data on-premises (air-gapped environments)
  • 🌍 Open-source contributors - Improve the product, add features

❌ Not For:

  • Production team use (single-user architecture)
  • Collaboration workflows (no case/knowledge sharing)
  • Enterprise compliance needs (no SSO/RBAC)

Quick Start

⚡ Four Simple Steps:

# 1. Install: Clone the repository
git clone https://github.com/FaultMaven/faultmaven-deploy.git
cd faultmaven-deploy

# 2. Configure: Add your settings
cp .env.example .env
# Edit .env and add:
#   - LLM API key (any provider: OPENAI_API_KEY, ANTHROPIC_API_KEY, GROQ_API_KEY, etc.)
#   - SERVER_HOST (your server IP, e.g., 192.168.0.200)

# 3. Protect: Resource limits (auto-created by wrapper)
# The ./faultmaven script handles this automatically

# 4. Run: Start everything with one command
./faultmaven start
# Docker automatically pulls pre-built images from Docker Hub
# Waits up to 120 seconds for all services to pass health checks

Expected output:

✅ Docker is running
✅ System has 15.5 GB RAM (8 GB required)
✅ Environment file configured (.env)
✅ Resource limits configured

Starting Docker containers...
⏳ Waiting for services to become healthy (up to 120 seconds)...

✅ FaultMaven services started successfully!

Next steps:
  1. Check status:  ./faultmaven status
  2. View logs:     ./faultmaven logs
  3. Access services:
     - API Gateway: http://192.168.0.200:8090/docs
     - Dashboard:   http://192.168.0.200:3000

If services don't start: Run ./faultmaven logs to see error details, or see Troubleshooting below.

What happens during deployment:

  • Docker pulls pre-built container images from Docker Hub
  • All 10 services use pre-built images (no building required)
  • No local repositories needed - everything from Docker Hub
  • First deployment downloads ~2-3GB of images (one-time)
  • Future updates only download changed layers (faster)

Prerequisites

Required:

  • Docker & Docker Compose (Get Docker)
  • 8GB RAM minimum (16GB recommended)
    • Default resource limits assume 8GB system RAM
    • Allocates ~5GB total: Agent (1.5GB), Knowledge (2GB), ChromaDB (1GB), Redis (512MB)
    • Remaining ~3GB for OS and other applications
    • 16GB+ systems: Edit docker-compose.override.yml to increase limits for better performance
  • LLM API Key - Choose one or more:

🎯 LLM Provider Options

Self-hosted FaultMaven uses one LLM for all operations - chat, analysis, and knowledge base queries. You configure a single provider in .env and it handles everything.

Available providers:

  • Cloud LLMs: OpenAI, Anthropic, Groq, Gemini, Fireworks, OpenRouter
  • Local LLMs: Ollama, LM Studio, LocalAI, vLLM

Cloud LLM (Recommended)

  • ✅ Fastest response (1-2 seconds)
  • ✅ Best reasoning quality
  • ✅ No local hardware needed
  • 💰 ~$0.10-$0.50 per session

Local LLM (Full data sovereignty)

  • ✅ Zero API costs
  • ✅ Air-gapped capable (offline)
  • ✅ Complete data control
  • ⚙️ Requires 8GB+ RAM (16GB+ recommended)
  • ⏱️ Slower (5-15 seconds vs 1-2 seconds)

What runs locally:

  • ✅ 10 Docker containers: 6 microservices + API Gateway + Dashboard + 2 job workers
    • Microservices: auth, session, case, knowledge, evidence, agent
    • API Gateway: Single entry point for all requests
    • Dashboard: Web UI for Global KB management
    • Job Workers: Celery worker + Celery Beat scheduler
  • ✅ ChromaDB vector database
  • ✅ Redis session store
  • ✅ SQLite data storage

Using the CLI Wrapper

The ./faultmaven script simplifies deployment with pre-flight checks and resource management:

# Start with full validation
./faultmaven start

# Check service status and health
./faultmaven status

# View logs (all services)
./faultmaven logs

# View logs (specific service)
./faultmaven logs fm-agent-service

# Stop services (preserves data)
./faultmaven stop

# Reset to factory defaults (DANGER: deletes all data)
./faultmaven clean

# Optional: Run end-to-end verification tests (troubleshooting only)
./faultmaven verify

# Show help
./faultmaven help

The wrapper automatically:

  • ✅ Checks Docker is running
  • ✅ Verifies you have 8GB+ RAM
  • ✅ Validates .env file has API key
  • ✅ Creates resource limits (docker-compose.override.yml)
  • ✅ Tests service health endpoints

Manual Deployment (Advanced)

If you prefer direct Docker Compose commands:

# Configure environment
cp .env.example .env
# Edit .env and add your LLM API key (see .env.example for all provider options)

# Create resource limits (recommended)
cp docker-compose.override.yml.example docker-compose.override.yml

# Start all services (pulls pre-built images from Docker Hub)
docker-compose up -d

# Check status
docker-compose ps

# Test health endpoints
# Note: Replace <SERVER_HOST> with 'localhost' (if on server) or server IP (if remote)
curl http://<SERVER_HOST>:8001/health  # Auth Service
curl http://<SERVER_HOST>:8002/health  # Session Service
curl http://<SERVER_HOST>:8003/health  # Case Service
curl http://<SERVER_HOST>:8004/health  # Knowledge Service
curl http://<SERVER_HOST>:8005/health  # Evidence Service
curl http://<SERVER_HOST>:8006/health  # Agent Service
curl http://<SERVER_HOST>:8090/health  # API Gateway

# Access web dashboard
# Replace <SERVER_HOST> with your server's IP address (from .env SERVER_HOST)
# Use 'localhost' only if accessing from the server itself
open http://<SERVER_HOST>:3000
# Example: http://192.168.0.200:3000

# ⚠️ SECURITY WARNING: Change default credentials immediately!
# Login: admin / changeme123

Expected health response:

{
  "status": "healthy",
  "service": "fm-case-service",
  "version": "1.0.0",
  "database": "sqlite+aiosqlite"
}

✅ FaultMaven is ready!


Using FaultMaven

Browser Extension - REQUIRED for AI Chat

⚠️ IMPORTANT: The browser extension is REQUIRED to interact with the FaultMaven AI agent. The backend server alone does not provide a chat interface.

Installation Options

Option 1: Chrome Web Store (Recommended)

# Coming soon - FaultMaven Copilot will be published to the Chrome Web Store
# Search for "FaultMaven Copilot" in Chrome Web Store

Option 2: Install from GitHub (Available Now)

# 1. Download the latest release
git clone https://github.com/FaultMaven/faultmaven-copilot.git
cd faultmaven-copilot

# 2. Build the extension
pnpm install
pnpm build

# 3. Load in Chrome
# - Open chrome://extensions/
# - Enable "Developer mode"
# - Click "Load unpacked"
# - Select the faultmaven-copilot/dist directory

Configure Extension

After installation, configure the extension to connect to your FaultMaven server:

# 1. Click the FaultMaven extension icon in Chrome
# 2. Go to Settings
# 3. Set API URL to: http://<SERVER_HOST>:8090
#    Example: http://192.168.0.200:8090
# 4. Login with your dashboard credentials (default: admin/changeme123)

What Each Component Does

Component Purpose Required For
Browser Extension AI chat interface, real-time troubleshooting, evidence upload AI chat (REQUIRED)
Dashboard (Port 3000) Knowledge base management, document upload, user settings Knowledge base only (optional)
Backend Server API services, AI agent, data processing Everything (REQUIRED)

Note: Without the browser extension, you can only interact with FaultMaven via direct API calls (developer option). The dashboard at port 3000 is for knowledge base management only, NOT for chatting with the AI agent.


Architecture

graph TB
    subgraph "User Interfaces"
        UI1["Browser Extension<br/>faultmaven-copilot<br/>• Real-time chat<br/>• Interactive Q&A<br/>• Evidence upload"]
        UI2["Dashboard Web UI<br/>Port 3000<br/>• Login/Authentication<br/>• Global KB management<br/>• Document upload"]
    end

    subgraph "API Layer"
        GW["API Gateway<br/>Port 8090<br/>Main entry point"]
    end

    subgraph "Microservices (Ports 8001-8006)"
        AUTH["Auth Service<br/>:8001<br/>Simple Auth"]
        SESSION["Session Service<br/>:8002<br/>Redis Sessions"]
        CASE["Case Service<br/>:8003<br/>Milestone Tracking"]
        KNOWLEDGE["Knowledge Service<br/>:8004<br/>3-Tier RAG"]
        EVIDENCE["Evidence Service<br/>:8005<br/>File Upload"]
        AGENT["Agent Service<br/>:8006<br/>LangGraph AI"]
    end

    subgraph "Data Layer"
        DB1[("SQLite<br/>/data/")]
        REDIS[("Redis<br/>:6379")]
        CHROMA[("ChromaDB<br/>:8007")]
        FILES[("File Storage<br/>./data/files")]
    end

    subgraph "Background Processing"
        WORKER["Celery Worker<br/>Job Processing"]
        BEAT["Celery Beat<br/>Scheduler"]
    end

    subgraph "External Services"
        LLM["Cloud LLM<br/>OpenAI/Anthropic/Groq"]
    end

    UI1 -->|HTTP API| GW
    UI2 -->|HTTP API| GW

    GW --> AUTH
    GW --> SESSION
    GW --> CASE
    GW --> KNOWLEDGE
    GW --> EVIDENCE
    GW --> AGENT

    AUTH --> DB1
    SESSION --> REDIS
    CASE --> DB1
    KNOWLEDGE --> CHROMA
    EVIDENCE --> FILES
    AGENT --> LLM

    WORKER --> REDIS
    BEAT --> REDIS
    WORKER --> LLM

    style GW fill:#4A90E2,stroke:#2E5C8A,stroke-width:3px,color:#fff
    style AGENT fill:#E27D60,stroke:#C25A3C,stroke-width:2px,color:#fff
    style LLM fill:#85C88A,stroke:#5A9F5E,stroke-width:2px,color:#fff
    style UI1 fill:#9B59B6,stroke:#6C3483,stroke-width:2px,color:#fff
    style UI2 fill:#9B59B6,stroke:#6C3483,stroke-width:2px,color:#fff
Loading

Services

Service Port Description
API Gateway 8090 Main entry point for all client requests
Auth Service 8001 User authentication (JWT, Redis sessions)
Session Service 8002 Session management with Redis
Case Service 8003 Case lifecycle & milestone tracking
Knowledge Service 8004 3-tier RAG knowledge base (ChromaDB + BGE-M3)
Evidence Service 8005 File uploads (logs, screenshots, configs)
Agent Service 8006 AI troubleshooting agent (LangGraph + MilestoneEngine)
Dashboard 3000 Web UI for Global KB management (React + Vite)
Job Worker - Background tasks (Celery + Redis)
Job Worker Beat - Celery task scheduler
Redis 6379 Session storage & task queue
ChromaDB 8007 Vector database for semantic search

Note: Individual service ports (8001-8007) are exposed for health checks and debugging. All API requests should go through the API Gateway on port 8090.


Data Persistence

All data is stored in the ./data/ directory:

./data/
├── faultmaven.db       # SQLite database (all microservices share this file)
└── uploads/            # Evidence files
    └── case_abc123/
        └── error.log

Benefits:

  • Portable - Zip entire ./data/ folder and move to another laptop
  • Simple Backup - zip -r backup.zip ./data
  • Version Control Friendly - .gitignore excludes /data/
  • Survives Restarts - Data persists across docker-compose down

Backup:

# Backup entire FaultMaven state
zip -r faultmaven-backup-$(date +%Y%m%d).zip ./data

# Restore on another machine
unzip faultmaven-backup-20251120.zip
docker-compose up -d

What's Included

  • Complete AI Agent - Full LangGraph agent with 8 milestones
  • 3-Tier RAG System - Personal KB + Global KB + Case Working Memory
  • All 8 Data Types - Logs, traces, profiles, metrics, config, code, text, visual
  • SQLite Database - Zero configuration, single file, portable
  • ChromaDB Vector Search - Semantic knowledge base retrieval
  • Background Jobs - Celery + Redis for async processing
  • Local File Storage - All evidence files stay on your machine

🚀 Need Production-Ready Infrastructure?

Self-hosted is single-user only. For production use, try FaultMaven Managed SaaS — available for free for individuals and teams.

Get elastic resource management, optimized performance, and enterprise-grade features. Learn More →


API Usage Examples

All API requests should go through the API Gateway (port 8090) - the single entry point for all client requests.

Important: Replace <SERVER_HOST> below with:

  • localhost if running commands ON the FaultMaven server itself
  • Your server IP (e.g., 192.168.0.200) if running FROM a different machine

Create a Case

curl -X POST http://<SERVER_HOST>:8090/api/v1/cases \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Production API latency spike",
    "description": "Users reporting slow response times",
    "user_id": "user_001"
  }'

Upload Evidence

curl -X POST http://<SERVER_HOST>:8090/api/v1/evidence \
  -F "file=@/path/to/error.log" \
  -F "case_id=case_abc123" \
  -F "evidence_type=log"

Query AI Agent

curl -X POST http://<SERVER_HOST>:8090/api/v1/agent/query \
  -H "Content-Type: application/json" \
  -d '{
    "case_id": "case_abc123",
    "message": "Analyze the error log and suggest root cause"
  }'

See QUICKSTART.md for complete API reference.


Security Architecture

Service-to-Service Authentication

FaultMaven implements JWT-based service authentication for secure internal communication between microservices.

Key Features:

  • Service Identity: Each microservice authenticates with a signed JWT token
  • User Context Propagation: Original user identity flows through the service chain
  • Asymmetric Cryptography: Auth service signs tokens with private key, services verify with public key
  • Local Verification: Services validate JWTs without calling auth service (zero network overhead)
  • Granular Permissions: Each service has specific allowed operations (e.g., case:read, knowledge:search)

How It Works:

  1. Services request JWT tokens from auth service on startup
  2. Internal API calls include Authorization: Bearer <service-jwt> header
  3. Target services verify JWT signature locally using public key
  4. Permission checks enforce access control based on service identity
  5. User context (X-User-ID header) flows through for audit/logging

Benefits:

  • Zero-trust security model for internal APIs
  • Complete audit trail of which service performed each action
  • Protection against unauthorized service-to-service calls
  • Ready for service mesh integration (mTLS)

Troubleshooting

Services won't start

# Check logs
docker-compose logs fm-case-service
docker-compose logs fm-agent-service

# Restart specific service
docker-compose restart fm-case-service

# Rebuild all services
docker-compose up -d --build

Database errors

# Remove old database and restart (WARNING: deletes all data)
rm -rf ./data/
docker-compose down
docker-compose up -d

Port conflicts

If ports are already in use, edit docker-compose.yml:

ports:
  - "9001:8000"  # Change external port (e.g., 8001 to 9001)

Port ranges used:

  • 8001-8007: Backend microservices + ChromaDB
  • 8090: API Gateway (main entry point)
  • 3000: Dashboard web UI
  • 6379: Redis

ChromaDB connection issues

⚠️ Note: ChromaDB doesn't have a built-in health check endpoint. Services that depend on it use retry logic to handle startup timing.

# Check if ChromaDB container is running
docker-compose ps chromadb

# View ChromaDB logs for errors
docker-compose logs chromadb

# Test ChromaDB manually
curl http://<SERVER_HOST>:8007/api/v1/heartbeat

# If ChromaDB is slow to start, wait 10-15 seconds then restart dependent services
docker-compose restart fm-knowledge-service
docker-compose restart fm-agent-service

# Full ChromaDB restart
docker-compose restart chromadb

Common ChromaDB issues:

  • Slow startup: ChromaDB can take 10-15 seconds to fully initialize. Wait before accessing it.
  • Race conditions: If knowledge service starts before ChromaDB is ready, it will retry automatically (up to 5 times with exponential backoff).
  • Connection refused: Check that port 8007 isn't in use by another application.

Updating

To update to the latest version:

# Pull latest changes
git pull origin main

# Rebuild containers
docker-compose up -d --build

# Verify services are healthy
docker-compose ps
curl http://<SERVER_HOST>:8003/health  # Replace <SERVER_HOST> with 'localhost' or server IP

Stopping FaultMaven

# Stop all services (data persists in ./data/)
docker-compose down

# Stop and remove data (WARNING: deletes everything)
docker-compose down -v
rm -rf ./data/

Development Setup

⚠️ For Contributors Only

If you want to build services from source instead of using pre-built Docker Hub images:

# Create a workspace directory
mkdir faultmaven-workspace
cd faultmaven-workspace

# Clone deployment repository
git clone https://github.com/FaultMaven/faultmaven-deploy.git

# Clone all service repositories (required for local builds)
repos=(
  "fm-core-lib"
  "fm-auth-service"
  "fm-session-service"
  "fm-case-service"
  "fm-knowledge-service"
  "fm-evidence-service"
  "fm-agent-service"
  "fm-api-gateway"
  "fm-job-worker"
  "faultmaven-dashboard"
)

for repo in "${repos[@]}"; do
  git clone https://github.com/FaultMaven/$repo.git
done

# Now deploy from the deploy repository
cd faultmaven-deploy
cp .env.example .env
# Edit .env with your settings

# Use docker-compose.dev.yml to build from local repositories
docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d

Directory structure after cloning:

faultmaven-workspace/
├── faultmaven-deploy/          # This repo
├── fm-auth-service/             # Auth microservice
├── fm-session-service/          # Session microservice
├── fm-case-service/             # Case microservice
├── fm-knowledge-service/        # Knowledge microservice
├── fm-evidence-service/         # Evidence microservice
├── fm-agent-service/            # Agent microservice
├── fm-api-gateway/              # API Gateway
├── fm-job-worker/               # Background jobs
├── faultmaven-dashboard/        # Web UI
└── fm-core-lib/                 # Shared library

Components

This deployment uses microservices from:


Documentation


License

Apache 2.0 License - See LICENSE for details.

Why Apache 2.0?

  • ✅ Use commercially without restrictions
  • ✅ Fork, modify, commercialize freely
  • ✅ Patent grant protection
  • ✅ Enterprise-friendly (same license as Kubernetes, Android)

TL;DR: You can use FaultMaven for anything, including building commercial products. No strings attached.


Support


Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

Quick start:

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Make changes and test locally
  4. Commit (git commit -m 'Add amazing feature')
  5. Push (git push origin feature/amazing-feature)
  6. Open Pull Request

FaultMaven - Making troubleshooting faster, smarter, and more collaborative.

About

Quickstart deployment for the self-hosted, open-source FaultMaven stack via Docker Compose.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages