Skip to content

Resilience: No tmux server crash recovery — all sessions orphaned if tmux dies #397

@OneStepAt4time

Description

@OneStepAt4time

Problem

If the tmux server crashes (OOM, bug, manual kill), all sessions become orphaned. Aegis has no mechanism to detect or recover from this.

Failure Scenario

  1. tmux server crashes
  2. All @n window IDs become invalid
  3. Aegis continues running, thinking sessions exist
  4. All API calls fail with tmux errors
  5. ensureSession() creates new session, but old windows are lost

Suggested Fix

  1. Add tmux health check to /health endpoint
  2. Implement session reconciliation on tmux errors
  3. Store window names in state, attempt re-attach by name after tmux restart

Source

Resilience audit swarm (2026-03-28)

Metadata

Metadata

Labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions