Skip to content

[Gastown] PR 21: Edge Case Handling #227

@jrf0110

Description

@jrf0110

Parent: #204 | Phase 4: Hardening

Revised: Edge cases updated for container-per-town model (container OOM, ephemeral disk, process-level isolation).

Goal

Handle edge cases and failure modes gracefully.

Edge Cases

  • Split-brain: Two processes for the same agent (race on restart) → Rig DO enforces single-writer per agent, container checks DO state before starting
  • Concurrent writes to same bead: SQLite serialization in DO handles this, but add optimistic locking for cross-DO operations
  • DO eviction during alarm: Alarms are durable and will re-fire
  • Container OOM: Kills all agents. DO alarms detect dead agents, new container starts, agents re-dispatched from DO state
  • Container sleep during active work: Agents must have pushed to remote. DO re-dispatches on wake. Checkpoint data in DO enables resumption
  • Gateway outage: Agent retries built into Kilo CLI; escalation if persistent
  • Partial agentDone: What if the polecat pushed the branch but the gt_done call failed? Checkpoint-based recovery
  • Duplicate mail delivery: Idempotency on mail delivery marking
  • Convoy with failed beads: Policy for partial convoy completion
  • Git worktree conflicts: Two agents accidentally assigned same branch → Rig DO enforces unique branch per agent

Dependencies

  • PR 5 (Rig DO Alarm — witness patrol)
  • PR 10 (Multiple Polecats)

Acceptance Criteria

  • Single-writer enforcement per agent (reject duplicate dispatch)
  • Container OOM recovery flow tested (DO re-dispatches all agents)
  • Optimistic locking for cross-DO operations
  • Checkpoint-based recovery for partial done flows
  • Idempotent mail delivery
  • Convoy partial completion policy implemented
  • All edge cases documented with test coverage

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions