Skip to content

feat: Handle touchSet violations with accept and correct paths #61

@nigel-dev

Description

@nigel-dev

Problem

When a plan job completes but has modified files outside its touchSet, the orchestrator marks the job as failed and pauses the plan at an on_error checkpoint. Currently the only suggested action is:

Fix the branch and retry with mc_plan_approve(checkpoint: "on_error", retry: "{jobName}")

This has several problems:

  1. No accept path. If the violations are legitimate (e.g., the agent needed to update a shared types file), there's no way to say "these changes are fine, proceed."
  2. Retry doesn't actually retry. mc_plan_approve(retry: "job") sets the job to ready_to_merge, which bypasses touchSet re-validation entirely. It doesn't relaunch the agent.
  3. No correction path. If the violations are actually wrong, there's no mechanism to relaunch the agent with context about what to fix.
  4. Invalid state transition. completed → failed was missing from VALID_JOB_TRANSITIONS, producing a console warning. (Fixed separately — failed added to completed's transition list.)

Proposed Solution

Three distinct actions for mc_plan_approve when a touchSet violation occurs:

Path 1 — Accept

The user reviews the violations and determines they're valid.

  • mc_plan_approve(checkpoint: "on_error") — accepts the checkpoint job's violations
  • Moves the specific touchSet-failed job from failed → ready_to_merge
  • Uses structured checkpoint context (checkpointContext.jobName + failureKind: "touchset") instead of parsing error strings
  • Only acts on the job that triggered the checkpoint, not all failed jobs

Path 2 — Correct (Relaunch)

The user wants the agent to fix the violations.

  • mc_plan_approve(checkpoint: "on_error", relaunch: "jobName")
  • New relaunchJobForCorrection method on Orchestrator (separate from launchJob)
  • Reuses the existing worktree and branch (changes are already there)
  • Kills the old tmux session if still alive
  • Constructs a correction prompt containing:
    • The original task prompt
    • The specific violations (which files)
    • The allowed touchSet patterns
    • Instructions to revert violating files without breaking intended work
  • Creates a new tmux session in the existing worktree
  • Sets pane-died hook, updates job entry in place (preserves identity)
  • Job transitions: failed → running
  • On completion, touchSet re-validates normally through the reconciler

Path 3 — Retry (existing, fixed)

The user manually fixed the branch and wants re-validation.

  • mc_plan_approve(checkpoint: "on_error", retry: "jobName") — existing param, corrected behavior
  • Re-runs validateTouchSet before moving to ready_to_merge
  • If still violating, stays failed and reports remaining violations
  • Fixes the current bug where retry skips validation entirely

State Machine Changes

# Add to VALID_JOB_TRANSITIONS:
completed: ['ready_to_merge', 'failed', 'stopped', 'canceled']     # already done
failed:    ['ready_to_merge', 'running', 'stopped', 'canceled']     # add 'running'

Implementation Details

Structured Checkpoint Context

Store failure metadata instead of relying on error string parsing:

// On PlanSpec or alongside checkpoint
checkpointContext?: {
  jobName: string;
  failureKind: 'touchset' | 'merge_conflict' | 'test_failure' | 'job_failed';
  touchSetViolations?: string[];  // file paths
};

Relaunch Method

New relaunchJobForCorrection(job, violations, touchSet) on Orchestrator that:

  1. Kills old tmux session/pane if alive
  2. Writes correction prompt + launcher script to existing worktree
  3. Creates new tmux session pointing at existing worktree
  4. Sets pane-died hook
  5. Updates existing job entry in place (new tmux target, reset timestamps, increment attempt count)
  6. Sets plan job status to running

Updated Notification Message

❌ Job "{name}" modified files outside its touchSet:
  Violations: src/types/search.ts, src/utils/format.ts
  Allowed: src/db/**

Options:
  • Accept violations:  mc_plan_approve(checkpoint: "on_error")
  • Agent fixes branch: mc_plan_approve(checkpoint: "on_error", relaunch: "{name}")
  • You fix, re-check:  mc_plan_approve(checkpoint: "on_error", retry: "{name}")

Files to Modify

  • src/lib/plan-types.ts — Add failed → running transition, checkpoint context types
  • src/lib/orchestrator.tsrelaunchJobForCorrection method, checkpoint context storage, updated notifications
  • src/tools/plan-approve.ts — Handle accept (no retry/relaunch), relaunch param, fix retry to re-validate
  • src/lib/job-state.ts — Support in-place job updates for relaunch

Edge Cases

  • Correction agent also violates touchSet: Normal re-validation catches it. Infinite manual retries are fine since each requires explicit user action. Track attempt count and display it.
  • Multiple jobs fail before plan pauses: Checkpoint context stores the specific job that triggered the pause. Other failures are handled separately.
  • Old tmux session still alive on relaunch: Kill deterministically before creating new session to prevent stale pane-died hooks from misfiring.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1: highImportant fix or feature — next up after criticalenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions