Skip to content

bug: Reconciler race condition silently drops state transitions #15

@nigel-dev

Description

@nigel-dev

Description

The orchestrator's reconcile loop uses a simple isReconciling boolean flag to prevent concurrent execution. If a state change triggers reconcile() while another reconciliation is in progress, the new trigger is silently dropped — not queued, not retried.

This means rapid state transitions (e.g., two jobs completing within the same reconcile cycle) can result in missed processing.

Steps to Reproduce

  1. Start a plan with 3+ parallel jobs.
  2. Have two jobs complete within the same 5-second reconcile interval.
  3. The first job's completion triggers reconcile.
  4. The second job's completion calls reconcile while the first is still running — silently returns.
  5. The second job may not be processed until the next periodic reconcile tick.

Expected Behavior

State transitions should never be silently dropped. Either:

  • Queue them and process after the current reconciliation completes.
  • Set a "dirty" flag so the reconciler runs again immediately after finishing.

Actual Behavior

// src/lib/orchestrator.ts:391-395
if (this.isReconciling) return;  // Silent drop
this.isReconciling = true;

Impact

  • Best case: Processing is delayed by one reconcile interval (5s).
  • Worst case: A dependency chain stalls because a completed job isn't processed, blocking downstream jobs that depend on it.

Proposed Fix

Replace the boolean flag with a "dirty" re-reconcile pattern:

private reconcilePending = false;

async reconcile() {
  if (this.isReconciling) {
    this.reconcilePending = true;  // Mark for re-run
    return;
  }
  this.isReconciling = true;
  try {
    do {
      this.reconcilePending = false;
      await this._doReconcile();
    } while (this.reconcilePending);
  } finally {
    this.isReconciling = false;
  }
}

Files Involved

  • src/lib/orchestrator.ts:391-395 — the isReconciling guard
  • src/lib/orchestrator.ts:354setInterval reconciler setup (5000ms)

Additional Context

Identified in the master audit report (Section 7: Robustness, Section 9: Phase 2). The audit notes: "Reconciler race: concurrent triggers silently dropped."

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1: highImportant fix or feature — next up after criticalbugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions