refactor: eliminate evaluating state race condition in supervisor pulse#2056
refactor: eliminate evaluating state race condition in supervisor pulse#2056marcusquinn merged 3 commits intomainfrom
Conversation
Phase 0.7 (stale-state recovery) and Phase 1 (worker evaluation) were racing: both queried running tasks independently, then both tried to transition them, causing tasks to get stuck in evaluating state. Fix: Phase 1 now transitions directly from running to final state (complete/pr_review/retrying/failed) without the intermediate evaluating state. Phase 0.7 no longer handles running/dispatched tasks (Phase 1 does). Phase 0.8 removed entirely (redundant with Phase 1). - Add running:complete, running:pr_review, running:retrying, running:queued, running:blocked to valid state transitions - Phase 1 re-reads DB state before acting, skips tasks already transitioned by other phases - Remove Phase 0.8 (130 lines of stale-running recovery) - Remove SIGTERM evaluating-rollback handler (no longer needed) - Net: -192 lines of fragile race-condition code
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the supervisor pulse mechanism to eliminate a critical race condition that caused tasks to become perpetually stuck in an 'evaluating' state. By removing an entire phase, streamlining task state transitions, and ensuring robust state checks, the system's reliability and efficiency in handling task lifecycle management are greatly improved. The changes simplify the state machine and reduce the complexity of task recovery logic. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughAdds a new AI-assisted task assessment script and sources it from the supervisor helper, expands VALID_TRANSITIONS with five running->* mappings, and simplifies pulse.sh evaluation flow by removing Phase 0.7/0.8 stale-recovery and intermediate phase-tracking in favor of direct re-evaluation and streamlined post-evaluation handling. Changes
Sequence Diagram(s)sequenceDiagram
participant Pulse as Supervisor/pulse.sh
participant Helper as supervisor-helper.sh
participant Assess as assess-task.sh
participant DB as Database
participant Worker as WorkerProcess
participant GH as GitHub
participant AI as AI/Model
Pulse->>Helper: trigger evaluation
Helper->>Assess: assess_task_with_metadata(task_id)
Assess->>DB: read task state & metadata
Assess->>Worker: check PID / logs
Assess->>GH: query PR/merge status (if PR URL or branch)
alt deterministic fast-path
Assess-->>Helper: return verdict (complete/retry/failed)
else use AI
Assess->>AI: send single-line prompt with gathered evidence
AI-->>Assess: single-line verdict
Assess-->>Helper: return verdict + metadata
end
Helper->>Pulse: apply transition based on verdict (uses new running:... transitions)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 21 03:58:26 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
The refactor successfully simplifies the state machine by removing the intermediate evaluating state for standard task completion, which significantly reduces the risk of tasks getting stuck. However, there is a logic flaw in the race condition check within Phase 1 that could still allow it to interfere with tasks being handled by Phase 0.7. Additionally, the use of 2>/dev/null for blanket error suppression in the new DB query and the cleanup trap violates repository rules and style guidelines, as detailed in the specific comments.
| # Transition to evaluating | ||
| cmd_transition "$tid" "evaluating" 2>>"$SUPERVISOR_LOG" || true | ||
| # Skip if another phase already moved this task to a post-running state | ||
| if [[ "$current_task_state" != "running" && "$current_task_state" != "dispatched" && "$current_task_state" != "evaluating" ]]; then |
There was a problem hiding this comment.
To fully eliminate the race condition, Phase 1 should skip tasks that have already been transitioned to the evaluating state by Phase 0.7. The current logic allows Phase 1 to proceed if the state is evaluating, which contradicts the goal of skipping tasks already handled by other phases (as noted in the PR description that Phase 0.7 now handles legacy evaluating tasks).
| if [[ "$current_task_state" != "running" && "$current_task_state" != "dispatched" && "$current_task_state" != "evaluating" ]]; then | |
| if [[ "$current_task_state" != "running" && "$current_task_state" != "dispatched" ]]; then |
| # the task stays in running/dispatched and the next pulse re-evaluates it. | ||
| # shellcheck disable=SC2064 # intentional: expand SUPERVISOR_DIR at definition time | ||
| trap "_phase1_cleanup_on_signal; release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' 2>/dev/null || true" TERM INT | ||
| trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' 2>/dev/null || true" TERM INT |
There was a problem hiding this comment.
Avoid using 2>/dev/null for blanket suppression of errors, even for rm -f. The style guide explicitly restricts its use to log file redirection. rm -f is already silent if the file does not exist.
| trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' 2>/dev/null || true" TERM INT | |
| trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' || true" TERM INT |
References
- 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. (link)
- Avoid using
2>/dev/nullfor blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging. - In shell scripts with
set -e, guardkillcommands within cleanup traps with|| trueto prevent premature termination if the target process has already exited, ensuring subsequent cleanup commands are executed.
| # Worker is done - evaluate outcome directly (no intermediate evaluating state) | ||
| # Re-read current state fresh from DB to avoid race conditions with Phase 0.7 | ||
| local current_task_state | ||
| current_task_state=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$tid")';" 2>/dev/null || echo "") |
There was a problem hiding this comment.
Avoid using 2>/dev/null for blanket suppression of database errors. If the query fails due to a locked database or permission issue, this will silently return an empty string, causing the task to be skipped and making debugging difficult.
| current_task_state=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$tid")';" 2>/dev/null || echo "") | |
| current_task_state=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$tid")';" || echo "") |
References
- 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. (link)
- Avoid using
2>/dev/nullfor blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
.agents/scripts/supervisor-helper.sh (1)
53-57:⚠️ Potential issue | 🟡 MinorState machine comment is now stale.
The ASCII diagram still shows
running -> evaluating -> completeas the primary path. With this PR, the main flow isrunning -> complete(orpr_review,retrying,blocked,failed). Theevaluatingstate is now only a legacy/crash-recovery path. Consider updating the comment to reflect the new primary flow and noteevaluatingas legacy.Similarly, the
show_usageblock at lines 372–376 describes the same outdated flow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor-helper.sh around lines 53 - 57, Update the stale ASCII state-machine comment and the show_usage description to reflect the new primary flow: change the main path to queued -> dispatched -> running -> complete (with alternate transitions from running to pr_review, retrying, blocked, or failed) and mark evaluating as a legacy/crash-recovery-only state; also update the show_usage block that currently documents the old running -> evaluating -> complete path to mirror this new primary flow and explicitly note that evaluating is only used for legacy/crash recovery..agents/scripts/supervisor/pulse.sh (1)
2767-2767:⚠️ Potential issue | 🟡 MinorStale reference to Phase 0.8 in log message.
This log message still references "Phase 0.7/0.8/4b" but Phase 0.8 no longer exists. Update to reflect the current phase structure.
Suggested fix
- log_warn " Stale state recovery (Phase 0.7/0.8/4b) should clean these up next pulse" + log_warn " Stale state recovery (Phase 0.7/4b) should clean these up next pulse"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/pulse.sh at line 2767, Update the log message in the log_warn call that currently contains "Stale state recovery (Phase 0.7/0.8/4b) should clean these up next pulse" to remove the obsolete "0.8" reference (e.g., change to "Stale state recovery (Phase 0.7/4b) should clean these up next pulse"); locate the call to log_warn with that exact string and replace the message text accordingly so it reflects the current phase structure.
🧹 Nitpick comments (1)
.agents/scripts/supervisor/pulse.sh (1)
1188-1196: Remove dead code variables_phase1_evaluating_tidand_phase1_pre_eval_state.These variables are assigned and cleared throughout the loop but never read. The comment on line 1188 states "for logging on pulse kill," but the TERM/INT trap handler (line 1196) only releases the lock and cleans up a temp file—it doesn't reference either variable. Remove the declarations and all corresponding assignment/clear operations to reduce confusion and keep the codebase clean.
♻️ Suggested cleanup
- # Track which task is being evaluated (for logging on pulse kill) - local _phase1_evaluating_tid="" - local _phase1_pre_eval_state=""And remove all corresponding assignments and clears throughout the loop (lines 1228–1229, 1306–1307, 1334–1335, 1422–1423, 1438–1439, 1567–1568).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/pulse.sh around lines 1188 - 1196, Remove the dead variables _phase1_evaluating_tid and _phase1_pre_eval_state: delete their initial declarations and remove every assignment/clear referencing them inside the loop (all places that set or unset _phase1_evaluating_tid and _phase1_pre_eval_state). Leave the existing trap and cleanup logic (the trap that calls release_pulse_lock and removes '${SUPERVISOR_DIR}/MODELS.md.tmp') untouched; do not introduce any references to these variables in the TERM/INT handler. Ensure no other code paths read those symbols after removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In @.agents/scripts/supervisor-helper.sh:
- Around line 53-57: Update the stale ASCII state-machine comment and the
show_usage description to reflect the new primary flow: change the main path to
queued -> dispatched -> running -> complete (with alternate transitions from
running to pr_review, retrying, blocked, or failed) and mark evaluating as a
legacy/crash-recovery-only state; also update the show_usage block that
currently documents the old running -> evaluating -> complete path to mirror
this new primary flow and explicitly note that evaluating is only used for
legacy/crash recovery.
In @.agents/scripts/supervisor/pulse.sh:
- Line 2767: Update the log message in the log_warn call that currently contains
"Stale state recovery (Phase 0.7/0.8/4b) should clean these up next pulse" to
remove the obsolete "0.8" reference (e.g., change to "Stale state recovery
(Phase 0.7/4b) should clean these up next pulse"); locate the call to log_warn
with that exact string and replace the message text accordingly so it reflects
the current phase structure.
---
Nitpick comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 1188-1196: Remove the dead variables _phase1_evaluating_tid and
_phase1_pre_eval_state: delete their initial declarations and remove every
assignment/clear referencing them inside the loop (all places that set or unset
_phase1_evaluating_tid and _phase1_pre_eval_state). Leave the existing trap and
cleanup logic (the trap that calls release_pulse_lock and removes
'${SUPERVISOR_DIR}/MODELS.md.tmp') untouched; do not introduce any references to
these variables in the TERM/INT handler. Ensure no other code paths read those
symbols after removal.
assess-task.sh gathers evidence from real sources of truth (worker log, TODO.md, GitHub PR state, process table) instead of trusting the DB cache. Deterministic fast paths handle clear-cut cases (FULL_LOOP_COMPLETE + PR, VERIFY_COMPLETE, PR merged). Ambiguous cases fall back to haiku AI call (~$0.001/call vs $0.50-2.00/worker). Wired into pulse.sh Phase 1 as preferred evaluator, with fallback chain: assess_task -> evaluate_worker_with_metadata -> evaluate_worker. Tested against 5 running tasks — all correctly assessed as complete with their PR URLs.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 21 04:15:37 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
Three bugs found during live testing: 1. gh CLI calls in gather_task_evidence consumed stdin from the while-read loop, causing Phase 1 to process only the first task. Fix: redirect stdin to /dev/null for all gh/opencode/claude calls. 2. record_evaluation_metadata leaked stdout into the outcome variable via command substitution. Fix: redirect stdout to /dev/null. 3. set -e (errexit) could abort the Phase 1 loop if any single task evaluation failed. Fix: set +e around evaluation, set -e after. Also adds Phase 1 task count logging for observability. Tested: all 5 running tasks correctly assessed and transitioned in a single pulse cycle.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Feb 21 04:54:31 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/supervisor-helper.sh:
- Around line 252-256: The documentation/usage header in
.agents/scripts/supervisor-helper.sh is out of sync with the VALID_TRANSITIONS
array: update the help/usage string that currently mentions the
running→evaluating transition to list the new direct running→complete,
running→pr_review, running→retrying, running→queued, and running→blocked
transitions so the header matches VALID_TRANSITIONS; locate and edit the
usage/help text or header near the VALID_TRANSITIONS declaration and replace the
old "running→evaluating" wording with the full set of
running→complete/pr_review/retrying/queued/blocked transitions.
In @.agents/scripts/supervisor/assess-task.sh:
- Around line 12-13: The script assess-task.sh currently only accepts verdicts
"complete|retry|failed" so exhausted-retry cases get marked failed; update all
validation/checks and rule mappings in assess-task.sh to include a new "blocked"
outcome (add "blocked" to any regex or list of allowed verdicts, to help
text/usage strings, and to the branch handling that maps exhausted retries to
decisions), and change the logic that previously forced exhausted retries to
"failed" to return "blocked" instead; specifically update the verdict validation
and parsing code and the rules handling around the blocks referenced (current
checks around lines 12–13, 240–251, and 277–280) so the parser, validators, and
downstream decision branches accept and propagate "blocked".
- Around line 86-87: The log_tail pipeline mistakenly uses "tail -5" which
limits extracted text to 5 lines; update the pipeline that sets log_tail (the
variable named log_tail in assess-task.sh) to use "tail -50" instead of "tail
-5" so it captures the last 50 text entries as the comment states, keeping the
rest of the pipeline (sed and the fallback tail -50 "$db_log") unchanged.
- Around line 101-108: The grep used to find the todo line can match substrings
(e.g., "t12" inside "t120") and treats dots as regex wildcards; change the
lookup to match the exact task ID literally and anchored as a token: escape
regex metacharacters in $task_id (create e.g. escaped_task_id by replacing . * ?
[ ] \( \) { } + ^ $ | \ with backslashed versions), then use grep -m1 -E with
word/token boundaries such as grep -m1 -E
"(^|[^[:alnum:]_.-])$escaped_task_id($|[^[:alnum:]_.-])" --file "$todo_file" to
set todo_line, and keep the existing todo_state logic that checks "$todo_line"
for "[x]" or "[X]". Ensure you update references to
task_id/todo_line/todo_file/todo_state in that block.
In @.agents/scripts/supervisor/pulse.sh:
- Around line 1231-1237: The loop temporarily disables errexit with "set +e"
around the evaluation block (see _phase1_pre_eval_state, _phase1_evaluating_tid
and the evaluation call) but several branches use "continue" while errexit
remains disabled, leaving the rest of the pulse non-strict; restore strict mode
by re-enabling "set -e" immediately after the evaluation returns (or run the
evaluation in a subshell so effects are local), and ensure any early "continue"
paths restore "set -e" before continuing so each iteration returns to strict
error handling.
| "running:complete" | ||
| "running:pr_review" | ||
| "running:retrying" | ||
| "running:queued" | ||
| "running:blocked" |
There was a problem hiding this comment.
Sync state-machine docs with direct running → terminal transitions
VALID_TRANSITIONS now allows direct running→complete/pr_review/retrying/queued/blocked, but the header/usage still describes running→evaluating. Updating the docs will prevent operator confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor-helper.sh around lines 252 - 256, The
documentation/usage header in .agents/scripts/supervisor-helper.sh is out of
sync with the VALID_TRANSITIONS array: update the help/usage string that
currently mentions the running→evaluating transition to list the new direct
running→complete, running→pr_review, running→retrying, running→queued, and
running→blocked transitions so the header matches VALID_TRANSITIONS; locate and
edit the usage/help text or header near the VALID_TRANSITIONS declaration and
replace the old "running→evaluating" wording with the full set of
running→complete/pr_review/retrying/queued/blocked transitions.
| # Returns a structured verdict: complete:<pr_url>, retry:<reason>, failed:<reason> | ||
| # |
There was a problem hiding this comment.
Add blocked outcome to align with max‑retry policy
The AI prompt/parser only allow complete|retry|failed, so exhausted‑retry cases will be forced to failed, bypassing the supervisor policy of blocking for human intervention. Add blocked to the allowed verdicts and update the rules accordingly.
🛠️ Suggested fix
-# Returns a structured verdict: complete:<pr_url>, retry:<reason>, failed:<reason>
+# Returns a structured verdict: complete:<pr_url>, retry:<reason>, blocked:<reason>, failed:<reason>
@@
-You are a task supervisor. Assess this task and return EXACTLY ONE LINE in this format:
- complete:<pr_url_or_verified_complete>
- retry:<brief_reason>
- failed:<brief_reason>
+You are a task supervisor. Assess this task and return EXACTLY ONE LINE in this format:
+ complete:<pr_url_or_verified_complete>
+ retry:<brief_reason>
+ blocked:<brief_reason>
+ failed:<brief_reason>
@@
-- If the worker failed and retries exhausted → failed:<reason>
+- If the worker failed and retries exhausted → blocked:<reason>
@@
-verdict=$(echo "$ai_result" | grep -oE '^(complete|retry|failed):.*' | head -1 || echo "")
+verdict=$(echo "$ai_result" | grep -oE '^(complete|retry|failed|blocked):.*' | head -1 || echo "")Also applies to: 240-251, 277-280
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/assess-task.sh around lines 12 - 13, The script
assess-task.sh currently only accepts verdicts "complete|retry|failed" so
exhausted-retry cases get marked failed; update all validation/checks and rule
mappings in assess-task.sh to include a new "blocked" outcome (add "blocked" to
any regex or list of allowed verdicts, to help text/usage strings, and to the
branch handling that maps exhausted retries to decisions), and change the logic
that previously forced exhausted retries to "failed" to return "blocked"
instead; specifically update the verdict validation and parsing code and the
rules handling around the blocks referenced (current checks around lines 12–13,
240–251, and 277–280) so the parser, validators, and downstream decision
branches accept and propagate "blocked".
| # Get the last 50 lines of actual text content (not JSON metadata) | ||
| log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -5 | sed -n 's/.*"text":"\([^"]*\)".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)") |
There was a problem hiding this comment.
log_tail only captures 5 lines (comment says 50)
The current pipeline limits the evidence to 5 text lines, which underfeeds the AI. Align it with the stated intent.
🛠️ Suggested fix
- log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -5 | sed -n 's/.*"text":"\([^"]*\)".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)")
+ log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -50 | sed -n 's/.*"text":"\([^"]*\)".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Get the last 50 lines of actual text content (not JSON metadata) | |
| log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -5 | sed -n 's/.*"text":"\([^"]*\)".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)") | |
| # Get the last 50 lines of actual text content (not JSON metadata) | |
| log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -50 | sed -n 's/.*"text":"\([^"]*\)".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/assess-task.sh around lines 86 - 87, The log_tail
pipeline mistakenly uses "tail -5" which limits extracted text to 5 lines;
update the pipeline that sets log_tail (the variable named log_tail in
assess-task.sh) to use "tail -50" instead of "tail -5" so it captures the last
50 text entries as the comment states, keeping the rest of the pipeline (sed and
the fallback tail -50 "$db_log") unchanged.
| # Get the first occurrence of this task ID | ||
| todo_line=$(grep -m1 "$task_id" "$todo_file" 2>/dev/null || echo "") | ||
| if [[ -n "$todo_line" ]]; then | ||
| if [[ "$todo_line" == *"[x]"* || "$todo_line" == *"[X]"* ]]; then | ||
| todo_state="checked" | ||
| else | ||
| todo_state="open" | ||
| fi |
There was a problem hiding this comment.
Anchor TODO matching to the exact task ID
grep -m1 "$task_id" can match t12 inside t120, and dots in subtask IDs are treated as regex wildcards. That can flip TODO state and mislead assessment. Use an anchored, escaped pattern.
🛠️ Suggested fix
- todo_line=$(grep -m1 "$task_id" "$todo_file" 2>/dev/null || echo "")
+ local task_id_re
+ task_id_re=$(printf '%s' "$task_id" | sed -E 's/[][\\.^$*+?{}|()]/\\&/g')
+ todo_line=$(grep -m1 -E "^[[:space:]]*- \\[[ xX-]\\] ${task_id_re}([[:space:]]|$)" "$todo_file" 2>/dev/null || echo "")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Get the first occurrence of this task ID | |
| todo_line=$(grep -m1 "$task_id" "$todo_file" 2>/dev/null || echo "") | |
| if [[ -n "$todo_line" ]]; then | |
| if [[ "$todo_line" == *"[x]"* || "$todo_line" == *"[X]"* ]]; then | |
| todo_state="checked" | |
| else | |
| todo_state="open" | |
| fi | |
| # Get the first occurrence of this task ID | |
| local task_id_re | |
| task_id_re=$(printf '%s' "$task_id" | sed -E 's/[][\\.^$*+?{}|()]/\\&/g') | |
| todo_line=$(grep -m1 -E "^[[:space:]]*- \\[[ xX-]\\] ${task_id_re}([[:space:]]|$)" "$todo_file" 2>/dev/null || echo "") | |
| if [[ -n "$todo_line" ]]; then | |
| if [[ "$todo_line" == *"[x]"* || "$todo_line" == *"[X]"* ]]; then | |
| todo_state="checked" | |
| else | |
| todo_state="open" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/assess-task.sh around lines 101 - 108, The grep
used to find the todo line can match substrings (e.g., "t12" inside "t120") and
treats dots as regex wildcards; change the lookup to match the exact task ID
literally and anchored as a token: escape regex metacharacters in $task_id
(create e.g. escaped_task_id by replacing . * ? [ ] \( \) { } + ^ $ | \ with
backslashed versions), then use grep -m1 -E with word/token boundaries such as
grep -m1 -E "(^|[^[:alnum:]_.-])$escaped_task_id($|[^[:alnum:]_.-])" --file
"$todo_file" to set todo_line, and keep the existing todo_state logic that
checks "$todo_line" for "[x]" or "[X]". Ensure you update references to
task_id/todo_line/todo_file/todo_state in that block.
| log_info " $tid: worker finished, evaluating..." | ||
| _phase1_pre_eval_state="$current_task_state" | ||
| _phase1_evaluating_tid="$tid" | ||
| # Temporarily disable errexit for evaluation — a single task failure | ||
| # must not abort evaluation of remaining tasks | ||
| set +e | ||
|
|
There was a problem hiding this comment.
Restore set -e before any early continue
set +e is enabled for evaluation, but several branches continue (e.g., alive, early completion paths). That leaves errexit disabled for subsequent tasks and the rest of the pulse, weakening fail‑fast behavior. Re‑enable immediately after the eval call (or wrap it) so each iteration returns to strict error handling.
🛠️ Suggested fix
- # Temporarily disable errexit for evaluation — a single task failure
- # must not abort evaluation of remaining tasks
- set +e
+ # Temporarily disable errexit for evaluation — a single task failure
+ # must not abort evaluation of remaining tasks
+ set +e
+ local eval_rc=0
@@
- if command -v assess_task_with_metadata &>/dev/null; then
- outcome=$(assess_task_with_metadata "$tid")
+ if command -v assess_task_with_metadata &>/dev/null; then
+ outcome=$(assess_task_with_metadata "$tid"); eval_rc=$?
eval_maker="assess_task"
elif command -v evaluate_worker_with_metadata &>/dev/null; then
- outcome=$(evaluate_worker_with_metadata "$tid" "$skip_ai")
+ outcome=$(evaluate_worker_with_metadata "$tid" "$skip_ai"); eval_rc=$?
eval_maker="evaluate_worker_with_metadata"
else
- outcome=$(evaluate_worker "$tid" "$skip_ai")
+ outcome=$(evaluate_worker "$tid" "$skip_ai"); eval_rc=$?
fi
+ set -e
+ if [[ $eval_rc -ne 0 ]]; then
+ log_warn " $tid: evaluation failed (rc=$eval_rc) — retrying"
+ outcome="retry:evaluation_error"
+ fiAlso applies to: 1322-1329, 1584-1589
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/pulse.sh around lines 1231 - 1237, The loop
temporarily disables errexit with "set +e" around the evaluation block (see
_phase1_pre_eval_state, _phase1_evaluating_tid and the evaluation call) but
several branches use "continue" while errexit remains disabled, leaving the rest
of the pulse non-strict; restore strict mode by re-enabling "set -e" immediately
after the evaluation returns (or run the evaluation in a subshell so effects are
local), and ensure any early "continue" paths restore "set -e" before continuing
so each iteration returns to strict error handling.



Summary
evaluatingstate indefinitelyrunningto final state (complete/pr_review/retrying/failed) without the intermediateevaluatingstateevaluatingcleanupRoot Cause
Phase 0.7 would transition a task from
running→pr_review. Then Phase 1 (which had already queried the DB and found the task inrunning) would callcmd_transitionwhich re-read the DB — but due to a timing issue, it still sawrunningand transitioned toevaluating, overwriting Phase 0.7's work. The task then got stuck inevaluatingwith no live worker and no phase to recover it.Fix
running:complete,running:pr_review,running:retrying,running:queued,running:blockedto valid state transitionsevaluatingstate — if pulse is killed mid-evaluation, task stays inrunningand next pulse re-evaluatesSummary by CodeRabbit