refactor: eliminate evaluating state race condition in supervisor pulse by marcusquinn · Pull Request #2056 · marcusquinn/aidevops

marcusquinn · 2026-02-21T03:57:51Z

Summary

Phase 0.7 and Phase 1 were racing: both independently queried running tasks, then both tried to transition them, causing tasks to get stuck in evaluating state indefinitely
Phase 1 now transitions directly from running to final state (complete/pr_review/retrying/failed) without the intermediate evaluating state
Removed Phase 0.8 entirely (130 lines) and neutered Phase 0.7 to only handle legacy evaluating cleanup
Net: -192 lines of fragile race-condition code

Root Cause

Phase 0.7 would transition a task from running → pr_review. Then Phase 1 (which had already queried the DB and found the task in running) would call cmd_transition which re-read the DB — but due to a timing issue, it still saw running and transitioned to evaluating, overwriting Phase 0.7's work. The task then got stuck in evaluating with no live worker and no phase to recover it.

Fix

Added running:complete, running:pr_review, running:retrying, running:queued, running:blocked to valid state transitions
Phase 1 re-reads current DB state before acting and skips tasks already transitioned by other phases
No intermediate evaluating state — if pulse is killed mid-evaluation, task stays in running and next pulse re-evaluates

Summary by CodeRabbit

New Features
- AI-assisted task assessment with multi-source evidence and structured verdicts; emits evaluation metadata and an explicit "alive" outcome.
Refactor
- Expanded task-state transitions from running to complete/pr_review/retrying/queued/blocked.
- Consolidated and simplified task evaluation flow and heartbeat/post-evaluation handling, removing legacy fast-paths and intermediate rollback hooks.

Phase 0.7 (stale-state recovery) and Phase 1 (worker evaluation) were racing: both queried running tasks independently, then both tried to transition them, causing tasks to get stuck in evaluating state. Fix: Phase 1 now transitions directly from running to final state (complete/pr_review/retrying/failed) without the intermediate evaluating state. Phase 0.7 no longer handles running/dispatched tasks (Phase 1 does). Phase 0.8 removed entirely (redundant with Phase 1). - Add running:complete, running:pr_review, running:retrying, running:queued, running:blocked to valid state transitions - Phase 1 re-reads DB state before acting, skips tasks already transitioned by other phases - Remove Phase 0.8 (130 lines of stale-running recovery) - Remove SIGTERM evaluating-rollback handler (no longer needed) - Net: -192 lines of fragile race-condition code

gemini-code-assist · 2026-02-21T03:58:05Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the supervisor pulse mechanism to eliminate a critical race condition that caused tasks to become perpetually stuck in an 'evaluating' state. By removing an entire phase, streamlining task state transitions, and ensuring robust state checks, the system's reliability and efficiency in handling task lifecycle management are greatly improved. The changes simplify the state machine and reduce the complexity of task recovery logic.

Highlights

Race Condition Elimination: Resolved a race condition where Phase 0.7 and Phase 1 of the supervisor pulse independently queried and attempted to transition tasks, leading to tasks getting stuck in an 'evaluating' state indefinitely.
Streamlined Task Transitions: Phase 1 now directly transitions tasks from 'running' to their final states (e.g., 'complete', 'pr_review', 'retrying', 'failed'), bypassing the intermediate 'evaluating' state entirely.
Phase 0.8 Removal: The entire Phase 0.8, responsible for stale 'running' task recovery, has been removed, resulting in a significant reduction of code.
Phase 0.7 Refocus: Phase 0.7 has been neutered to solely handle cleanup of legacy 'evaluating' tasks, rather than actively managing 'running' or 'dispatched' tasks.
Robustness to Pulse Crashes: If the supervisor pulse is killed mid-evaluation, tasks now remain in the 'running' or 'dispatched' state, allowing the next pulse to re-evaluate them without being stuck in an 'evaluating' state.
Database State Re-reading: Phase 1 now re-reads the current database state before acting on a task, preventing conflicts and ensuring it skips tasks already transitioned by other phases.

Changelog

.agents/scripts/supervisor-helper.sh
- Added new valid state transitions from 'running' to 'complete', 'pr_review', 'retrying', 'queued', and 'blocked'.
.agents/scripts/supervisor/pulse.sh
- Removed the logic for fast-path and standard grace period queries for 'running' and 'dispatched' tasks from Phase 0.7, limiting its scope to 'evaluating' tasks.
- Eliminated the entire Phase 0.8, which handled stale 'running' task recovery.
- Removed the _phase1_cleanup_on_signal function and its associated trap, as the intermediate 'evaluating' state is no longer used for active tasks.
- Modified Phase 1's evaluation logic to directly process tasks from 'running' or 'dispatched' states, bypassing an explicit transition to 'evaluating'.
- Introduced a check in Phase 1 to skip tasks that have already been transitioned to a post-running state by another phase.
- Removed the condition to skip AI evaluation for tasks previously stuck in 'evaluating', as this state is no longer an active part of the primary flow.
- Simplified comments related to pre- and post-evaluation heartbeats.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-21T03:58:10Z

Walkthrough

Adds a new AI-assisted task assessment script and sources it from the supervisor helper, expands VALID_TRANSITIONS with five running->* mappings, and simplifies pulse.sh evaluation flow by removing Phase 0.7/0.8 stale-recovery and intermediate phase-tracking in favor of direct re-evaluation and streamlined post-evaluation handling.

Changes

Cohort / File(s)	Summary
Supervisor helper `.agents/scripts/supervisor-helper.sh`	Sources new `assess-task.sh` and expands `VALID_TRANSITIONS` with five new `running:...` mappings: `complete`, `pr_review`, `retrying`, `queued`, `blocked`.
Pulse evaluation `.agents/scripts/supervisor/pulse.sh`	Removed Phase 0.7/0.8 fast-path/stale-running recovery and intermediate phase1 rollback/trap mechanics; Phase 1 re-reads DB state and defers to assessment paths (now prefers `assess_task_with_metadata`); simplified heartbeat/post-eval flow.
AI-assisted assessment `.agents/scripts/supervisor/assess-task.sh`	New script added. Gathers DB, process, log, TODO.md and GitHub evidence; offers `gather_task_evidence`, `assess_task`, `assess_task_with_metadata`; uses deterministic fast-paths and optional AI (model client) to emit single-line verdicts (`complete:...`, `retry:...`, `failed:...`).

Sequence Diagram(s)

sequenceDiagram
  participant Pulse as Supervisor/pulse.sh
  participant Helper as supervisor-helper.sh
  participant Assess as assess-task.sh
  participant DB as Database
  participant Worker as WorkerProcess
  participant GH as GitHub
  participant AI as AI/Model

  Pulse->>Helper: trigger evaluation
  Helper->>Assess: assess_task_with_metadata(task_id)
  Assess->>DB: read task state & metadata
  Assess->>Worker: check PID / logs
  Assess->>GH: query PR/merge status (if PR URL or branch)
  alt deterministic fast-path
    Assess-->>Helper: return verdict (complete/retry/failed)
  else use AI
    Assess->>AI: send single-line prompt with gathered evidence
    AI-->>Assess: single-line verdict
    Assess-->>Helper: return verdict + metadata
  end
  Helper->>Pulse: apply transition based on verdict (uses new running:... transitions)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

t1259: Fix stale evaluating recovery pattern with pre/post-eval heartbeats #1968: Overlaps pulse evaluation/heartbeat and replaces t1259-era logic—directly related to the pulse.sh changes in this PR.
feat: add 3-tier outcome evaluation and re-prompt cycle to supervisor (t128.3) #378: Integrates AI/assessment-style evaluation and state-machine transitions; shares modifications to supervisor evaluation wiring.
feat: add supervisor-helper.sh with SQLite schema and state machine (t128.1) #376: Touches supervisor-helper.sh state machine transitions; directly related to the VALID_TRANSITIONS edits here.

Poem

🛠️ New assessor wakes, the supervisor sings,
Running splinters into five new wings,
Pulse skips old traps, reads the state anew,
AI lends a verdict, concise and true,
Automation dances — tidy, bright, and few. ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: eliminating a race condition in the supervisor pulse between stale-state recovery and worker evaluation phases that caused tasks to get stuck in evaluating state.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor/intelligent-pulse

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-21T03:58:27Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 34 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 03:58:23 UTC 2026: Code review monitoring started
Sat Feb 21 03:58:23 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 34

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 34
VULNERABILITIES: 0

Generated on: Sat Feb 21 03:58:26 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

gemini-code-assist

Code Review

The refactor successfully simplifies the state machine by removing the intermediate evaluating state for standard task completion, which significantly reduces the risk of tasks getting stuck. However, there is a logic flaw in the race condition check within Phase 1 that could still allow it to interfere with tasks being handled by Phase 0.7. Additionally, the use of 2>/dev/null for blanket error suppression in the new DB query and the cleanup trap violates repository rules and style guidelines, as detailed in the specific comments.

gemini-code-assist · 2026-02-21T03:59:36Z

.agents/scripts/supervisor/pulse.sh

-				# Transition to evaluating
-				cmd_transition "$tid" "evaluating" 2>>"$SUPERVISOR_LOG" || true
+			# Skip if another phase already moved this task to a post-running state
+			if [[ "$current_task_state" != "running" && "$current_task_state" != "dispatched" && "$current_task_state" != "evaluating" ]]; then


To fully eliminate the race condition, Phase 1 should skip tasks that have already been transitioned to the evaluating state by Phase 0.7. The current logic allows Phase 1 to proceed if the state is evaluating, which contradicts the goal of skipping tasks already handled by other phases (as noted in the PR description that Phase 0.7 now handles legacy evaluating tasks).

Suggested change

if [[ "$current_task_state" != "running" && "$current_task_state" != "dispatched" && "$current_task_state" != "evaluating" ]]; then

if [[ "$current_task_state" != "running" && "$current_task_state" != "dispatched" ]]; then

gemini-code-assist · 2026-02-21T03:59:36Z

.agents/scripts/supervisor/pulse.sh

+		# the task stays in running/dispatched and the next pulse re-evaluates it.
 		# shellcheck disable=SC2064  # intentional: expand SUPERVISOR_DIR at definition time
-		trap "_phase1_cleanup_on_signal; release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' 2>/dev/null || true" TERM INT
+		trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' 2>/dev/null || true" TERM INT


Avoid using 2>/dev/null for blanket suppression of errors, even for rm -f. The style guide explicitly restricts its use to log file redirection. rm -f is already silent if the file does not exist.

Suggested change

trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' 2>/dev/null || true" TERM INT

trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' || true" TERM INT

References

2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. ^(link)

Avoid using 2>/dev/null for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

In shell scripts with set -e, guard kill commands within cleanup traps with || true to prevent premature termination if the target process has already exited, ensuring subsequent cleanup commands are executed.

gemini-code-assist · 2026-02-21T03:59:36Z

.agents/scripts/supervisor/pulse.sh

+			# Worker is done - evaluate outcome directly (no intermediate evaluating state)
+			# Re-read current state fresh from DB to avoid race conditions with Phase 0.7
 			local current_task_state
 			current_task_state=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$tid")';" 2>/dev/null || echo "")


Avoid using 2>/dev/null for blanket suppression of database errors. If the query fails due to a locked database or permission issue, this will silently return an empty string, causing the task to be skipped and making debugging difficult.

Suggested change

current_task_state=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$tid")';" 2>/dev/null || echo "")

current_task_state=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$tid")';" || echo "")

References

2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. ^(link)

Avoid using 2>/dev/null for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

.agents/scripts/supervisor-helper.sh (1)
53-57: ⚠️ Potential issue | 🟡 Minor

State machine comment is now stale.

The ASCII diagram still shows running -> evaluating -> complete as the primary path. With this PR, the main flow is running -> complete (or pr_review, retrying, blocked, failed). The evaluating state is now only a legacy/crash-recovery path. Consider updating the comment to reflect the new primary flow and note evaluating as legacy.

Similarly, the show_usage block at lines 372–376 describes the same outdated flow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor-helper.sh around lines 53 - 57, Update the stale
ASCII state-machine comment and the show_usage description to reflect the new
primary flow: change the main path to queued -> dispatched -> running ->
complete (with alternate transitions from running to pr_review, retrying,
blocked, or failed) and mark evaluating as a legacy/crash-recovery-only state;
also update the show_usage block that currently documents the old running ->
evaluating -> complete path to mirror this new primary flow and explicitly note
that evaluating is only used for legacy/crash recovery.
.agents/scripts/supervisor/pulse.sh (1)
2767-2767: ⚠️ Potential issue | 🟡 Minor

Stale reference to Phase 0.8 in log message.

This log message still references "Phase 0.7/0.8/4b" but Phase 0.8 no longer exists. Update to reflect the current phase structure.
Suggested fix
-		log_warn "  Stale state recovery (Phase 0.7/0.8/4b) should clean these up next pulse"
+		log_warn "  Stale state recovery (Phase 0.7/4b) should clean these up next pulse"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/pulse.sh at line 2767, Update the log message in
the log_warn call that currently contains "Stale state recovery (Phase
0.7/0.8/4b) should clean these up next pulse" to remove the obsolete "0.8"
reference (e.g., change to "Stale state recovery (Phase 0.7/4b) should clean
these up next pulse"); locate the call to log_warn with that exact string and
replace the message text accordingly so it reflects the current phase structure.

🧹 Nitpick comments (1)

.agents/scripts/supervisor/pulse.sh (1)
1188-1196: Remove dead code variables _phase1_evaluating_tid and _phase1_pre_eval_state.

These variables are assigned and cleared throughout the loop but never read. The comment on line 1188 states "for logging on pulse kill," but the TERM/INT trap handler (line 1196) only releases the lock and cleans up a temp file—it doesn't reference either variable. Remove the declarations and all corresponding assignment/clear operations to reduce confusion and keep the codebase clean.
♻️ Suggested cleanup
-	# Track which task is being evaluated (for logging on pulse kill)
-	local _phase1_evaluating_tid=""
-	local _phase1_pre_eval_state=""
And remove all corresponding assignments and clears throughout the loop (lines 1228–1229, 1306–1307, 1334–1335, 1422–1423, 1438–1439, 1567–1568).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/pulse.sh around lines 1188 - 1196, Remove the
dead variables _phase1_evaluating_tid and _phase1_pre_eval_state: delete their
initial declarations and remove every assignment/clear referencing them inside
the loop (all places that set or unset _phase1_evaluating_tid and
_phase1_pre_eval_state). Leave the existing trap and cleanup logic (the trap
that calls release_pulse_lock and removes '${SUPERVISOR_DIR}/MODELS.md.tmp')
untouched; do not introduce any references to these variables in the TERM/INT
handler. Ensure no other code paths read those symbols after removal.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In @.agents/scripts/supervisor-helper.sh:
- Around line 53-57: Update the stale ASCII state-machine comment and the
show_usage description to reflect the new primary flow: change the main path to
queued -> dispatched -> running -> complete (with alternate transitions from
running to pr_review, retrying, blocked, or failed) and mark evaluating as a
legacy/crash-recovery-only state; also update the show_usage block that
currently documents the old running -> evaluating -> complete path to mirror
this new primary flow and explicitly note that evaluating is only used for
legacy/crash recovery.

In @.agents/scripts/supervisor/pulse.sh:
- Line 2767: Update the log message in the log_warn call that currently contains
"Stale state recovery (Phase 0.7/0.8/4b) should clean these up next pulse" to
remove the obsolete "0.8" reference (e.g., change to "Stale state recovery
(Phase 0.7/4b) should clean these up next pulse"); locate the call to log_warn
with that exact string and replace the message text accordingly so it reflects
the current phase structure.

---

Nitpick comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 1188-1196: Remove the dead variables _phase1_evaluating_tid and
_phase1_pre_eval_state: delete their initial declarations and remove every
assignment/clear referencing them inside the loop (all places that set or unset
_phase1_evaluating_tid and _phase1_pre_eval_state). Leave the existing trap and
cleanup logic (the trap that calls release_pulse_lock and removes
'${SUPERVISOR_DIR}/MODELS.md.tmp') untouched; do not introduce any references to
these variables in the TERM/INT handler. Ensure no other code paths read those
symbols after removal.

assess-task.sh gathers evidence from real sources of truth (worker log, TODO.md, GitHub PR state, process table) instead of trusting the DB cache. Deterministic fast paths handle clear-cut cases (FULL_LOOP_COMPLETE + PR, VERIFY_COMPLETE, PR merged). Ambiguous cases fall back to haiku AI call (~$0.001/call vs $0.50-2.00/worker). Wired into pulse.sh Phase 1 as preferred evaluator, with fallback chain: assess_task -> evaluate_worker_with_metadata -> evaluate_worker. Tested against 5 running tasks — all correctly assessed as complete with their PR URLs.

github-actions · 2026-02-21T04:15:38Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 34 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 04:15:33 UTC 2026: Code review monitoring started
Sat Feb 21 04:15:34 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 34

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 34
VULNERABILITIES: 0

Generated on: Sat Feb 21 04:15:37 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

Three bugs found during live testing: 1. gh CLI calls in gather_task_evidence consumed stdin from the while-read loop, causing Phase 1 to process only the first task. Fix: redirect stdin to /dev/null for all gh/opencode/claude calls. 2. record_evaluation_metadata leaked stdout into the outcome variable via command substitution. Fix: redirect stdout to /dev/null. 3. set -e (errexit) could abort the Phase 1 loop if any single task evaluation failed. Fix: set +e around evaluation, set -e after. Also adds Phase 1 task count logging for observability. Tested: all 5 running tasks correctly assessed and transitioned in a single pulse cycle.

github-actions · 2026-02-21T04:54:32Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 34 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 04:54:28 UTC 2026: Code review monitoring started
Sat Feb 21 04:54:28 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 34

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 34
VULNERABILITIES: 0

Generated on: Sat Feb 21 04:54:31 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-21T04:55:20Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/supervisor-helper.sh:
- Around line 252-256: The documentation/usage header in
.agents/scripts/supervisor-helper.sh is out of sync with the VALID_TRANSITIONS
array: update the help/usage string that currently mentions the
running→evaluating transition to list the new direct running→complete,
running→pr_review, running→retrying, running→queued, and running→blocked
transitions so the header matches VALID_TRANSITIONS; locate and edit the
usage/help text or header near the VALID_TRANSITIONS declaration and replace the
old "running→evaluating" wording with the full set of
running→complete/pr_review/retrying/queued/blocked transitions.

In @.agents/scripts/supervisor/assess-task.sh:
- Around line 12-13: The script assess-task.sh currently only accepts verdicts
"complete|retry|failed" so exhausted-retry cases get marked failed; update all
validation/checks and rule mappings in assess-task.sh to include a new "blocked"
outcome (add "blocked" to any regex or list of allowed verdicts, to help
text/usage strings, and to the branch handling that maps exhausted retries to
decisions), and change the logic that previously forced exhausted retries to
"failed" to return "blocked" instead; specifically update the verdict validation
and parsing code and the rules handling around the blocks referenced (current
checks around lines 12–13, 240–251, and 277–280) so the parser, validators, and
downstream decision branches accept and propagate "blocked".
- Around line 86-87: The log_tail pipeline mistakenly uses "tail -5" which
limits extracted text to 5 lines; update the pipeline that sets log_tail (the
variable named log_tail in assess-task.sh) to use "tail -50" instead of "tail
-5" so it captures the last 50 text entries as the comment states, keeping the
rest of the pipeline (sed and the fallback tail -50 "$db_log") unchanged.
- Around line 101-108: The grep used to find the todo line can match substrings
(e.g., "t12" inside "t120") and treats dots as regex wildcards; change the
lookup to match the exact task ID literally and anchored as a token: escape
regex metacharacters in $task_id (create e.g. escaped_task_id by replacing . * ?
[ ] \( \) { } + ^ $ | \ with backslashed versions), then use grep -m1 -E with
word/token boundaries such as grep -m1 -E
"(^|[^[:alnum:]_.-])$escaped_task_id($|[^[:alnum:]_.-])" --file "$todo_file" to
set todo_line, and keep the existing todo_state logic that checks "$todo_line"
for "[x]" or "[X]". Ensure you update references to
task_id/todo_line/todo_file/todo_state in that block.

In @.agents/scripts/supervisor/pulse.sh:
- Around line 1231-1237: The loop temporarily disables errexit with "set +e"
around the evaluation block (see _phase1_pre_eval_state, _phase1_evaluating_tid
and the evaluation call) but several branches use "continue" while errexit
remains disabled, leaving the rest of the pulse non-strict; restore strict mode
by re-enabling "set -e" immediately after the evaluation returns (or run the
evaluation in a subshell so effects are local), and ensure any early "continue"
paths restore "set -e" before continuing so each iteration returns to strict
error handling.

coderabbitai · 2026-02-21T05:05:55Z

.agents/scripts/supervisor-helper.sh

+	"running:complete"
+	"running:pr_review"
+	"running:retrying"
+	"running:queued"
+	"running:blocked"


⚠️ Potential issue | 🟡 Minor

Sync state-machine docs with direct running → terminal transitions

VALID_TRANSITIONS now allows direct running→complete/pr_review/retrying/queued/blocked, but the header/usage still describes running→evaluating. Updating the docs will prevent operator confusion.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor-helper.sh around lines 252 - 256, The documentation/usage header in .agents/scripts/supervisor-helper.sh is out of sync with the VALID_TRANSITIONS array: update the help/usage string that currently mentions the running→evaluating transition to list the new direct running→complete, running→pr_review, running→retrying, running→queued, and running→blocked transitions so the header matches VALID_TRANSITIONS; locate and edit the usage/help text or header near the VALID_TRANSITIONS declaration and replace the old "running→evaluating" wording with the full set of running→complete/pr_review/retrying/queued/blocked transitions.

coderabbitai · 2026-02-21T05:05:55Z

.agents/scripts/supervisor/assess-task.sh

+# Returns a structured verdict: complete:<pr_url>, retry:<reason>, failed:<reason>
+#


⚠️ Potential issue | 🟠 Major

Add blocked outcome to align with max‑retry policy

The AI prompt/parser only allow complete|retry|failed, so exhausted‑retry cases will be forced to failed, bypassing the supervisor policy of blocking for human intervention. Add blocked to the allowed verdicts and update the rules accordingly.

🛠️ Suggested fix

-# Returns a structured verdict: complete:<pr_url>, retry:<reason>, failed:<reason> +# Returns a structured verdict: complete:<pr_url>, retry:<reason>, blocked:<reason>, failed:<reason> @@ -You are a task supervisor. Assess this task and return EXACTLY ONE LINE in this format: - complete:<pr_url_or_verified_complete> - retry:<brief_reason> - failed:<brief_reason> +You are a task supervisor. Assess this task and return EXACTLY ONE LINE in this format: + complete:<pr_url_or_verified_complete> + retry:<brief_reason> + blocked:<brief_reason> + failed:<brief_reason> @@ -- If the worker failed and retries exhausted → failed:<reason> +- If the worker failed and retries exhausted → blocked:<reason> @@ -verdict=$(echo "$ai_result" | grep -oE '^(complete|retry|failed):.*' | head -1 || echo "") +verdict=$(echo "$ai_result" | grep -oE '^(complete|retry|failed|blocked):.*' | head -1 || echo "")

Also applies to: 240-251, 277-280

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/assess-task.sh around lines 12 - 13, The script assess-task.sh currently only accepts verdicts "complete|retry|failed" so exhausted-retry cases get marked failed; update all validation/checks and rule mappings in assess-task.sh to include a new "blocked" outcome (add "blocked" to any regex or list of allowed verdicts, to help text/usage strings, and to the branch handling that maps exhausted retries to decisions), and change the logic that previously forced exhausted retries to "failed" to return "blocked" instead; specifically update the verdict validation and parsing code and the rules handling around the blocks referenced (current checks around lines 12–13, 240–251, and 277–280) so the parser, validators, and downstream decision branches accept and propagate "blocked".

coderabbitai · 2026-02-21T05:05:55Z

.agents/scripts/supervisor/assess-task.sh

+			# Get the last 50 lines of actual text content (not JSON metadata)
+			log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -5 | sed -n 's/.*"text":"\([^"]*\)".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)")


⚠️ Potential issue | 🟡 Minor

log_tail only captures 5 lines (comment says 50)

The current pipeline limits the evidence to 5 text lines, which underfeeds the AI. Align it with the stated intent.

🛠️ Suggested fix

- log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -5 | sed -n 's/.*"text":"$[^"]*$".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)") + log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -50 | sed -n 's/.*"text":"$[^"]*$".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Get the last 50 lines of actual text content (not JSON metadata)

log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -5 | sed -n 's/.*"text":"$[^"]*$".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)")

# Get the last 50 lines of actual text content (not JSON metadata)

log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -50 | sed -n 's/.*"text":"$[^"]*$".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/assess-task.sh around lines 86 - 87, The log_tail pipeline mistakenly uses "tail -5" which limits extracted text to 5 lines; update the pipeline that sets log_tail (the variable named log_tail in assess-task.sh) to use "tail -50" instead of "tail -5" so it captures the last 50 text entries as the comment states, keeping the rest of the pipeline (sed and the fallback tail -50 "$db_log") unchanged.

coderabbitai · 2026-02-21T05:05:55Z

.agents/scripts/supervisor/assess-task.sh

+			# Get the first occurrence of this task ID
+			todo_line=$(grep -m1 "$task_id" "$todo_file" 2>/dev/null || echo "")
+			if [[ -n "$todo_line" ]]; then
+				if [[ "$todo_line" == *"[x]"* || "$todo_line" == *"[X]"* ]]; then
+					todo_state="checked"
+				else
+					todo_state="open"
+				fi


⚠️ Potential issue | 🟡 Minor

Anchor TODO matching to the exact task ID

grep -m1 "$task_id" can match t12 inside t120, and dots in subtask IDs are treated as regex wildcards. That can flip TODO state and mislead assessment. Use an anchored, escaped pattern.

🛠️ Suggested fix

- todo_line=$(grep -m1 "$task_id" "$todo_file" 2>/dev/null || echo "") + local task_id_re + task_id_re=$(printf '%s' "$task_id" | sed -E 's/[][\\.^$*+?{}|()]/\\&/g') + todo_line=$(grep -m1 -E "^[[:space:]]*- \\[[ xX-]\\] ${task_id_re}([[:space:]]|$)" "$todo_file" 2>/dev/null || echo "")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Get the first occurrence of this task ID

todo_line=$(grep -m1 "$task_id" "$todo_file" 2>/dev/null || echo "")

if [[ -n "$todo_line" ]]; then

if [[ "$todo_line" == *"[x]"* || "$todo_line" == *"[X]"* ]]; then

todo_state="checked"

else

todo_state="open"

fi

# Get the first occurrence of this task ID

local task_id_re

task_id_re=$(printf '%s' "$task_id" | sed -E 's/[][\\.^$*+?{}|()]/\\&/g')

todo_line=$(grep -m1 -E "^[[:space:]]*- \\[[ xX-]\\] ${task_id_re}([[:space:]]|$)" "$todo_file" 2>/dev/null || echo "")

if [[ -n "$todo_line" ]]; then

if [[ "$todo_line" == *"[x]"* || "$todo_line" == *"[X]"* ]]; then

todo_state="checked"

else

todo_state="open"

fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/assess-task.sh around lines 101 - 108, The grep used to find the todo line can match substrings (e.g., "t12" inside "t120") and treats dots as regex wildcards; change the lookup to match the exact task ID literally and anchored as a token: escape regex metacharacters in $task_id (create e.g. escaped_task_id by replacing . * ? [ ]  { } + ^ $ | \ with backslashed versions), then use grep -m1 -E with word/token boundaries such as grep -m1 -E "(^|[^[:alnum:]_.-])$escaped_task_id($|[^[:alnum:]_.-])" --file "$todo_file" to set todo_line, and keep the existing todo_state logic that checks "$todo_line" for "[x]" or "[X]". Ensure you update references to task_id/todo_line/todo_file/todo_state in that block.

coderabbitai · 2026-02-21T05:05:55Z

.agents/scripts/supervisor/pulse.sh

+			log_info "  $tid: worker finished, evaluating..."
+			_phase1_pre_eval_state="$current_task_state"
+			_phase1_evaluating_tid="$tid"
+			# Temporarily disable errexit for evaluation — a single task failure
+			# must not abort evaluation of remaining tasks
+			set +e
+


⚠️ Potential issue | 🟠 Major

Restore set -e before any early continue

set +e is enabled for evaluation, but several branches continue (e.g., alive, early completion paths). That leaves errexit disabled for subsequent tasks and the rest of the pulse, weakening fail‑fast behavior. Re‑enable immediately after the eval call (or wrap it) so each iteration returns to strict error handling.

🛠️ Suggested fix

- # Temporarily disable errexit for evaluation — a single task failure - # must not abort evaluation of remaining tasks - set +e + # Temporarily disable errexit for evaluation — a single task failure + # must not abort evaluation of remaining tasks + set +e + local eval_rc=0 @@ - if command -v assess_task_with_metadata &>/dev/null; then - outcome=$(assess_task_with_metadata "$tid") + if command -v assess_task_with_metadata &>/dev/null; then + outcome=$(assess_task_with_metadata "$tid"); eval_rc=$? eval_maker="assess_task" elif command -v evaluate_worker_with_metadata &>/dev/null; then - outcome=$(evaluate_worker_with_metadata "$tid" "$skip_ai") + outcome=$(evaluate_worker_with_metadata "$tid" "$skip_ai"); eval_rc=$? eval_maker="evaluate_worker_with_metadata" else - outcome=$(evaluate_worker "$tid" "$skip_ai") + outcome=$(evaluate_worker "$tid" "$skip_ai"); eval_rc=$? fi + set -e + if [[ $eval_rc -ne 0 ]]; then + log_warn " $tid: evaluation failed (rc=$eval_rc) — retrying" + outcome="retry:evaluation_error" + fi

Also applies to: 1322-1329, 1584-1589

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/pulse.sh around lines 1231 - 1237, The loop temporarily disables errexit with "set +e" around the evaluation block (see _phase1_pre_eval_state, _phase1_evaluating_tid and the evaluation call) but several branches use "continue" while errexit remains disabled, leaving the rest of the pulse non-strict; restore strict mode by re-enabling "set -e" immediately after the evaluation returns (or run the evaluation in a subshell so effects are local), and ensure any early "continue" paths restore "set -e" before continuing so each iteration returns to strict error handling.

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

coderabbitai bot approved these changes Feb 21, 2026

View reviewed changes

marcusquinn merged commit b8a9f29 into main Feb 21, 2026
10 of 11 checks passed

coderabbitai bot requested changes Feb 21, 2026

View reviewed changes

	if [[ "$current_task_state" != "running" && "$current_task_state" != "dispatched" && "$current_task_state" != "evaluating" ]]; then
	if [[ "$current_task_state" != "running" && "$current_task_state" != "dispatched" ]]; then

	trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' 2>/dev/null \|\| true" TERM INT
	trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' \|\| true" TERM INT

	current_task_state=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$tid")';" 2>/dev/null \|\| echo "")
	current_task_state=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$tid")';" \|\| echo "")

		# Returns a structured verdict: complete:<pr_url>, retry:<reason>, failed:<reason>
		#

		# Get the last 50 lines of actual text content (not JSON metadata)
		log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null \| tail -5 \| sed -n 's/."text":"\([^"]\)".*/\1/p' 2>/dev/null \| tail -50 \|\| tail -50 "$db_log" 2>/dev/null \|\| echo "(could not read log)")

Conversation

marcusquinn commented Feb 21, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

github-actions bot commented Feb 21, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 21, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

github-actions bot commented Feb 21, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 21, 2026

Quality Gate passed

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 21, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 21, 2026 •

edited

Loading