forked from code-yeongyu/oh-my-opencode
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Context
oh-im-broke philosophy: Use what you have. Turn spare laptops into an agent swarm.
Current Limitations
- Default concurrency: 5 agents per model
- Single-machine bottleneck
- Potential OOM when running many parallel agents
- Underutilized spare hardware (old laptops, extra desktops)
Vision
Transform heterogeneous machines (desktop + spare laptops with different specs) into a distributed agent cluster:
- Master node: Main desktop running OpenCode
- Worker nodes: Spare laptops/machines running agent workers
- Simple setup: Docker-based, no complex Kubernetes
- Resource-aware: Tasks distributed based on machine capabilities
- Fault-tolerant: Worker failures don't crash entire system
- Monitored: Real-time visibility into cluster health/utilization
Goals
- Massive Parallelism: Run 20+ concurrent agents across cluster
- Cost Efficiency: Use existing hardware instead of cloud
- Simple Setup:
docker-compose uplevel simplicity - Resource Monitoring: Track memory/CPU/agent utilization per node
- Graceful Degradation: Continue working if workers go offline
Architecture (Draft)
Components Needed
-
Task Queue System
- Distribute
delegate_taskcalls to available workers - Priority queuing (high/medium/low)
- Retry logic for failed tasks
- Distribute
-
Worker Manager
- Register/deregister workers dynamically
- Health checks (heartbeat)
- Capability reporting (CPU/RAM/GPU)
-
Resource Monitor
- Per-node metrics (memory, CPU, active agents)
- Cluster-wide dashboard
- Alerts for OOM/overload conditions
-
Budget Orchestrator Extension
- Track usage across all nodes
- Distribute free-tier usage across workers
- Failover when node exhausts resources
Existing Infrastructure to Leverage
- ✅ Background agent manager (
src/features/background-agent/manager.ts) - ✅ Concurrency limits (
src/features/background-agent/concurrency.ts) - ✅ WebUI with stats/monitoring endpoints
- ✅ Budget orchestrator with provider tracking
- ✅ Usage tracking system
⚠️ Needs extension: All currently single-node
Research Needed
- Lightweight orchestration options (Docker Swarm, K3s, Nomad, etc.)
- Task queue systems (Celery, BullMQ, RQ, etc.)
- Similar projects (LLM agent clusters, distributed AI inference)
- Resource monitoring tools compatible with heterogeneous hardware
- Network protocols for task distribution (gRPC, WebSockets, HTTP/2)
Proposed Sub-Issues
- Resource monitoring & OOM detection (prerequisite)
- Task queue abstraction layer
- Worker node implementation (Docker image)
- Master-worker communication protocol
- Cluster configuration & discovery
- WebUI cluster dashboard
- Budget orchestrator cluster support
- Fault tolerance & failover logic
- Documentation & setup guide
Success Criteria
- Run 20+ concurrent agents across 3+ machines
- Automatic worker registration/deregistration
- Real-time cluster health monitoring
- Graceful handling of worker failures
- Setup time < 30 minutes for new worker node
- All existing features work unchanged (backward compatible)
Non-Goals (for initial implementation)
- ❌ Perfect load balancing (simple round-robin OK)
- ❌ Auto-scaling (manual worker addition OK)
- ❌ GPU distribution (future enhancement)
- ❌ Cross-internet workers (LAN only initially)
Related Issues
- feat: implement mixed provider pricing strategy for free/paid/sometimes-free tiers #2 - Hybrid provider pricing (free tier distribution across cluster)
- (To be created: Resource monitoring)
- (To be created: OOM detection)
Timeline: Research → Prototype → Incremental implementation
Priority: Medium (after current budget orchestrator work completes)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels