Skip to content

Python orchestrator that runs my distributed AI lab. Tracks GPU temps, kills runaway processes, enforces explicit container naming, and makes sub-agents explain themselves before proceeding. My three decades of 'trust but verify' as code.

Notifications You must be signed in to change notification settings

OCNGill/The-Commander-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

86 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

The-Commander: Distributed AI Orchestration System

Version: v1.4.7
Last Updated: January 21, 2026
Status: βœ… Per-Agent Logging & Launcher System Complete
Test Coverage: 295 tests passing (100%)
Next Release: 1.5.0 (Autonomous Intelligence Era)


πŸ†• Filesystem Explorer & Storage Architecture (v1.4.7)

The Commander now includes a comprehensive filesystem explorer for browsing and selecting files on any node in the cluster - both local and remote.

Features

  • Remote File Browsing: Browse directories on any node from any GUI
  • Drive Enumeration: Automatically detect drives (Windows) or root (Linux)
  • Storage Path Selection: Select storage folders via file explorer instead of hardcoding
  • Binary Discovery: Browse and select engine binaries (.exe, .bat, .sh) from any location
  • Model Selection: Navigate model folders and select .gguf files visually
  • Transparent Proxying: Remote node requests seamlessly proxy through the cluster

Storage Architecture: Local-First, Network-Synced

Commander OS implements a "Local-First" storage pattern where:

  • Each node operates on local NVMe/SSD for maximum performance
  • Critical events (node offline) sync immediately to HTPC/ZFS
  • Regular events batch sync every 60 seconds to reduce network overhead
  • ZFS pool on HTPC is the authoritative storage (end authority)

Filesystem API Endpoints

GET /nodes/{node_id}/filesystem/drives
    β†’ Returns list of drives (C:\, D:\, etc.) or root (/)

GET /nodes/{node_id}/filesystem/list?path=...
    β†’ Returns directory contents with file/folder metadata

Relay Server Auto-Management

  • Relay server automatically restarts when reigniting nodes with relay or storage roles
  • Storage location is broadcast to all nodes when relay starts
  • Nodes can query relay config via GET /relay/config

Configuration Simplification

  • No more hardcoded paths: Binary paths removed from relay.yaml
  • User-selected paths: All file paths now chosen via GUI file explorer
  • Dynamic discovery: Engines, models, and storage paths discovered at runtime

πŸ†• Commander Chat Interface (v1.4.3)

The Commander Chat provides direct interaction with The Commander Avatar - your AI strategic advisor powered by 100% local inference.

Features

  • Conversation Management: Create multiple chat threads with persistent history
  • Real-time Streaming: See responses as they generate via WebSocket
  • Auto Node Selection: Commander routes to highest-performance node automatically
  • Intent Classification: Understands commands, queries, and conversations
  • Decision Engine: Trust boundaries prevent unauthorized operations
  • Cross-Node Access: Access any node's GUI from any machine in the cluster

Chat Panel Layout

  • Left Sidebar: Conversation list + "New Chat" button
  • Center: Message stream with user/assistant bubbles
  • Bottom Bar: Node stats (commanding node, model, context, TPS)
  • Input: Command line with send button

How Chat Routing Works

  1. You type a message in the chat input
  2. Message goes to Commander Avatar on the hub
  3. Avatar classifies intent (QUERY, COMMAND, CHAT, ESCALATION)
  4. Decision Engine checks trust boundaries
  5. LlamaClient routes inference to highest TPS node (not the selected node in left panel)
  6. Response streams back to GUI via WebSocket

Note: The left panel "Selected Node" is for configuration only. Chat always routes through the highest-performance available node.

Cross-Node Browser Access

You can access any node's GUI from any browser on the LAN:

  • From Laptop β†’ open http://gillsystems-main:8000 in browser
  • From Main β†’ open http://gillsystems-htpc:8001 in browser
  • GUI automatically uses the same host/port for API calls

πŸ†• Unified Launcher System (v1.4.1)

Both Windows and Linux now use unified clear-launch launchers by default. These automatically clear stale Commander processes and ports before startup, eliminating "Address already in use" errors.

Primary Launchers:

  • Windows: The_Commander.bat
  • Linux/macOS: ./the_commander.sh

These now include:

  • βœ… Automatic port clearance - kills zombie Commander processes
  • βœ… Network preflight validation - verifies all nodes are reachable
  • βœ… Auto-hosts setup - configures /etc/hosts or Windows hosts file with node mappings
  • βœ… Dynamic versioning - pulls version from centralized commander_os.__version__.py
  • βœ… GUI/HUD sync - ensures frontend and backend launch in correct order

Legacy launchers (without auto-clear) are archived in old_versions/ for reference.

Quick Start

# Windows
The_Commander.bat

# Linux/macOS
./the_commander.sh

πŸ†• Network Preflight System (v1.4.1)

Commander OS includes a self-healing network identity layer that automatically resolves and corrects node addresses at startup. This eliminates the "Hub is unreachable" error caused by DHCP IP address changes.

How It Works

  1. Hostname-based configuration: config/relay.yaml uses hostnames (e.g., gillsystems-main) instead of hard-coded IPs
  2. Preflight smoke test: Before starting the hub, the system resolves all hostnames and verifies port reachability
  3. LAN discovery fallback: If hostname resolution fails, the system scans the local subnet to find the node by its /identity endpoint
  4. Runtime caching: Resolved addresses are cached for fast subsequent lookups

Setup: Hosts File Configuration (Optional)

Preflight automatically detects and adds missing hosts entries. Manual setup is optional:

Windows (auto-setup when running as admin):

  • Launcher automatically detects and adds missing entries to hosts file

Linux/macOS (auto-setup with sudo):

sudo ./the_commander.sh

Manual Edit (if preferred):

# Windows: C:\Windows\System32\drivers\etc\hosts
# Linux/macOS: /etc/hosts

# Gillsystems Commander OS Nodes
10.0.0.164    gillsystems-main
10.0.0.42     gillsystems-htpc
10.0.0.93     gillsystems-laptop
10.0.0.139    gillsystems-steam-deck

Preflight CLI Options

Available when running the backend manually via docs/main.py:

# Run with full preflight (default)
python docs/main.py commander-gui-dashboard

# Skip preflight for faster startup (launchers use this)
python docs/main.py commander-gui-dashboard --skip-preflight

# Disable LAN discovery (hostname/DNS only)
python docs/main.py commander-gui-dashboard --no-discovery

# Auto-update relay.yaml with discovered IPs
python docs/main.py commander-gui-dashboard --update-config

Troubleshooting Network Issues

  1. "Hub is unreachable": Ensure nodes are powered on or check hostname resolution:

    ping gillsystems-main
  2. Preflight fails: Check that target nodes are running Commander OS

  3. Discovery finds wrong node: Ensure each node has the /identity endpoint (included in v1.4.0+)

  4. Port already in use: Launchers automatically clear stale processes. Run again if needed.

  5. Best Practice: Set up DHCP reservations on your router to give each machine a stable IP address


Maintenance & Troubleshooting

Clearing Ghost Processes

If you encounter [Errno 98] Address already in use, it means a previous instance of the Commander is still hanging in the background. Use the following tactical clearance scripts to reset your node's ports:

  • Linux / Steam Deck:
    ./kill_all_active_port_stealers_for_your_node.sh
  • Windows:
    kill_all_active_port_stealers_for_your_node.bat

System Requirements

  • OS: Windows 10/11 or Linux (Ubuntu 20.04+)
  • Python: v3.10 ONLY (Strict Requirement)
    • Warning: Python 3.14+ will cause dependency failures
  • Node.js: v20+ (LTS Required for Vite compatibility)
  • GPU: AMD Radeon 7000 Series (Optional, for Local LLM Acceleration)

Automated Setup (Recommended for Linux)

For automated installation of all dependencies (Python 3.10, Node.js 20+, npm, build tools):

cd The-Commander-Agent
./scripts/linux_prereqs.sh

This script will:

  • Install Node.js 20+ (required for Vite frontend)
  • Ensure Python 3.10 is available (via pyenv if needed)
  • Create virtual environment and install Python dependencies
  • Validate all prerequisites before completion

Quick Start (Single Node)

  1. Clone Repository:

    git clone https://github.com/OCNGill/The-Commander-Agent.git
    cd The-Commander-Agent
  2. Install Dependencies (run the platform pre-req first):

    # Linux (recommended)
    ./scripts/linux_prereqs.sh
    
    # Windows (run as Administrator)
    # From Explorer: Right-click `install_prereqs.bat` and choose "Run as administrator"
    # Or in an elevated PowerShell prompt:
    .\install_prereqs.bat

    After the prereq installer finishes, continue with the launcher step below to start the Commander OS.

  3. Launch System:

    # Windows
    The_Commander.bat
    
    # Linux
    ./the_commander.sh
  4. Access Dashboard: Open browser to http://localhost:5173


Multi-Node Deployment (Full Functionality)

For complete model discovery and distributed orchestration across all nodes:

  1. Clone Repository on Each Node:

    # On each physical machine (Main, HTPC, Steam-Deck, Laptop)
    git clone https://github.com/OCNGill/The-Commander-Agent.git
    cd The-Commander-Agent
  2. Install Dependencies on Each Node:

    # Automated (Linux) - Recommended
    ./scripts/linux_prereqs.sh
    
    # Manual Windows
    py -3.10 -m pip install -r requirements.txt
    
    # Manual Linux
    python3.10 -m pip install -r requirements.txt
  3. Verify Node Configuration:

    • Check config/relay.yaml for correct IP addresses and ports
    • Ensure model_root_path points to each node's model directory
  4. Launch on Each Node:

    # Each node automatically detects its identity based on port
    The_Commander.bat  # Windows
    ./the_commander.sh  # Linux
  5. Verify Network Connectivity:

    • Each node's API should be accessible at http://<node-ip>:<port>
    • Test from any node: curl http://10.0.0.164:8000/nodes

Why Multi-Node Deployment?

  • Model Discovery: Each node scans its own filesystem and reports available models
  • Distributed Inference: Chat requests route to highest-ranking available node
  • Load Balancing: System automatically distributes work across active nodes
  • Fault Tolerance: If one node fails, others continue operating

Architectural Topology

Node ID Physical Host Hardware Bench (t/s) Model Configuration
Gillsystems-Main Gillsystems-Main Radeon 7900XTX 130 Qwen3-Coder-25B (131k ctx, 999 NGL)
Gillsystems-HTPC Gillsystems-HTPC Radeon 7600 60 Granite-4.0-h-tiny (114k ctx, 40 NGL)
Gillsystems-Steam-Deck Gillsystems-Steam-Deck Custom APU 30 Granite-4.0-h-tiny (21k ctx, 32 NGL)
Gillsystems-Laptop Gillsystems-Laptop Integrated 9 Granite-4.0-h-tiny (21k ctx, 999 NGL)

Changelog

Version 1.4.3 (January 16, 2026 - Phase 8.3)

  • βœ… Chat Interface Overhaul: Persistent chat history with multisession conversation management.
  • βœ… Secure Conversation Storage: SQLite on ZFS for all historical chat context.
  • βœ… Frontend Sidebar: New "Conversations" panel for instant switching between strategic threads.
  • βœ… Context Awareness: Optimized message handling with conversation-specific routing.

Version 1.4.1 (Phase 8.1)

  • Implementing Commander brain logic (Avatar + Cyberbot)
  • Decision Engine with trust boundaries and escalation rules
  • Working chat interface in Strategic Dashboard GUI
  • Local llama.cpp integration for natural language processing
  • Path 4 (Hybrid Local Architecture) selected after LLM consultation

Version 1.3

  • Implemented the new storage framework for Commander OS.
  • Added agent-specific storage modules for Commander and Recruiter agents.
  • Updated documentation to reflect the new storage architecture.

Documentation Links


πŸ’– Support / Donate

If you find this project helpful, you can support ongoing work β€” thank you!

PayPal QR code Venmo QR code

Donate:


Gillsystems logo with QR codes and icons

PayPal Venmo

About

Python orchestrator that runs my distributed AI lab. Tracks GPU temps, kills runaway processes, enforces explicit container naming, and makes sub-agents explain themselves before proceeding. My three decades of 'trust but verify' as code.

Resources

Stars

Watchers

Forks

Packages

No packages published