-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Parent: #204 | Phase 4: Hardening
Revised: Focus on container resource limits and multi-agent concurrency within a single container, rather than cloud-agent-next session throughput.
Goal
Validate the system under load and identify container resource limits.
Test Scenarios
- Simulate 30 concurrent polecats across 5 rigs in a single Town Container
- Measure DO→container latency under load (tool call round-trip time)
- Measure container resource usage (CPU, memory, disk) with N concurrent Kilo CLI processes
- Identify when to scale to larger instance types or shard to multiple containers
- Identify DO SQLite size limits and implement archival (closed beads → purge from DO)
- Test container crash/restart/restore cycles (ephemeral disk recovery)
- Measure review queue throughput under concurrent PR submissions
Metrics to Capture
- Tool call round-trip latency (p50, p95, p99)
- DO alarm scheduling accuracy under load
- SQLite query performance vs row count
- Container CPU/memory per Kilo CLI process
- Git clone/worktree creation time
- Container cold start time (fresh start vs restore from R2 snapshot)
- Memory/CPU usage per DO
Acceptance Criteria
- Load test harness for simulating concurrent polecats in a container
- Latency benchmarks documented
- Container resource usage profiled (CPU, memory, disk per agent)
- SQLite archival strategy implemented and tested
- Container crash/restore resilience validated
- Scaling thresholds documented (max agents per container by instance type)
- Performance bottlenecks identified and documented
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels