Skip to content

refactor: eliminate evaluating state race condition in supervisor pulse#2056

Merged
marcusquinn merged 3 commits intomainfrom
refactor/intelligent-pulse
Feb 21, 2026
Merged

refactor: eliminate evaluating state race condition in supervisor pulse#2056
marcusquinn merged 3 commits intomainfrom
refactor/intelligent-pulse

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 21, 2026

Summary

  • Phase 0.7 and Phase 1 were racing: both independently queried running tasks, then both tried to transition them, causing tasks to get stuck in evaluating state indefinitely
  • Phase 1 now transitions directly from running to final state (complete/pr_review/retrying/failed) without the intermediate evaluating state
  • Removed Phase 0.8 entirely (130 lines) and neutered Phase 0.7 to only handle legacy evaluating cleanup
  • Net: -192 lines of fragile race-condition code

Root Cause

Phase 0.7 would transition a task from runningpr_review. Then Phase 1 (which had already queried the DB and found the task in running) would call cmd_transition which re-read the DB — but due to a timing issue, it still saw running and transitioned to evaluating, overwriting Phase 0.7's work. The task then got stuck in evaluating with no live worker and no phase to recover it.

Fix

  1. Added running:complete, running:pr_review, running:retrying, running:queued, running:blocked to valid state transitions
  2. Phase 1 re-reads current DB state before acting and skips tasks already transitioned by other phases
  3. No intermediate evaluating state — if pulse is killed mid-evaluation, task stays in running and next pulse re-evaluates

Summary by CodeRabbit

  • New Features
    • AI-assisted task assessment with multi-source evidence and structured verdicts; emits evaluation metadata and an explicit "alive" outcome.
  • Refactor
    • Expanded task-state transitions from running to complete/pr_review/retrying/queued/blocked.
    • Consolidated and simplified task evaluation flow and heartbeat/post-evaluation handling, removing legacy fast-paths and intermediate rollback hooks.

Phase 0.7 (stale-state recovery) and Phase 1 (worker evaluation) were
racing: both queried running tasks independently, then both tried to
transition them, causing tasks to get stuck in evaluating state.

Fix: Phase 1 now transitions directly from running to final state
(complete/pr_review/retrying/failed) without the intermediate evaluating
state. Phase 0.7 no longer handles running/dispatched tasks (Phase 1
does). Phase 0.8 removed entirely (redundant with Phase 1).

- Add running:complete, running:pr_review, running:retrying,
  running:queued, running:blocked to valid state transitions
- Phase 1 re-reads DB state before acting, skips tasks already
  transitioned by other phases
- Remove Phase 0.8 (130 lines of stale-running recovery)
- Remove SIGTERM evaluating-rollback handler (no longer needed)
- Net: -192 lines of fragile race-condition code
@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the supervisor pulse mechanism to eliminate a critical race condition that caused tasks to become perpetually stuck in an 'evaluating' state. By removing an entire phase, streamlining task state transitions, and ensuring robust state checks, the system's reliability and efficiency in handling task lifecycle management are greatly improved. The changes simplify the state machine and reduce the complexity of task recovery logic.

Highlights

  • Race Condition Elimination: Resolved a race condition where Phase 0.7 and Phase 1 of the supervisor pulse independently queried and attempted to transition tasks, leading to tasks getting stuck in an 'evaluating' state indefinitely.
  • Streamlined Task Transitions: Phase 1 now directly transitions tasks from 'running' to their final states (e.g., 'complete', 'pr_review', 'retrying', 'failed'), bypassing the intermediate 'evaluating' state entirely.
  • Phase 0.8 Removal: The entire Phase 0.8, responsible for stale 'running' task recovery, has been removed, resulting in a significant reduction of code.
  • Phase 0.7 Refocus: Phase 0.7 has been neutered to solely handle cleanup of legacy 'evaluating' tasks, rather than actively managing 'running' or 'dispatched' tasks.
  • Robustness to Pulse Crashes: If the supervisor pulse is killed mid-evaluation, tasks now remain in the 'running' or 'dispatched' state, allowing the next pulse to re-evaluate them without being stuck in an 'evaluating' state.
  • Database State Re-reading: Phase 1 now re-reads the current database state before acting on a task, preventing conflicts and ensuring it skips tasks already transitioned by other phases.
Changelog
  • .agents/scripts/supervisor-helper.sh
    • Added new valid state transitions from 'running' to 'complete', 'pr_review', 'retrying', 'queued', and 'blocked'.
  • .agents/scripts/supervisor/pulse.sh
    • Removed the logic for fast-path and standard grace period queries for 'running' and 'dispatched' tasks from Phase 0.7, limiting its scope to 'evaluating' tasks.
    • Eliminated the entire Phase 0.8, which handled stale 'running' task recovery.
    • Removed the _phase1_cleanup_on_signal function and its associated trap, as the intermediate 'evaluating' state is no longer used for active tasks.
    • Modified Phase 1's evaluation logic to directly process tasks from 'running' or 'dispatched' states, bypassing an explicit transition to 'evaluating'.
    • Introduced a check in Phase 1 to skip tasks that have already been transitioned to a post-running state by another phase.
    • Removed the condition to skip AI evaluation for tasks previously stuck in 'evaluating', as this state is no longer an active part of the primary flow.
    • Simplified comments related to pre- and post-evaluation heartbeats.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 21, 2026

Walkthrough

Adds a new AI-assisted task assessment script and sources it from the supervisor helper, expands VALID_TRANSITIONS with five running->* mappings, and simplifies pulse.sh evaluation flow by removing Phase 0.7/0.8 stale-recovery and intermediate phase-tracking in favor of direct re-evaluation and streamlined post-evaluation handling.

Changes

Cohort / File(s) Summary
Supervisor helper
.agents/scripts/supervisor-helper.sh
Sources new assess-task.sh and expands VALID_TRANSITIONS with five new running:... mappings: complete, pr_review, retrying, queued, blocked.
Pulse evaluation
.agents/scripts/supervisor/pulse.sh
Removed Phase 0.7/0.8 fast-path/stale-running recovery and intermediate phase1 rollback/trap mechanics; Phase 1 re-reads DB state and defers to assessment paths (now prefers assess_task_with_metadata); simplified heartbeat/post-eval flow.
AI-assisted assessment
.agents/scripts/supervisor/assess-task.sh
New script added. Gathers DB, process, log, TODO.md and GitHub evidence; offers gather_task_evidence, assess_task, assess_task_with_metadata; uses deterministic fast-paths and optional AI (model client) to emit single-line verdicts (complete:..., retry:..., failed:...).

Sequence Diagram(s)

sequenceDiagram
  participant Pulse as Supervisor/pulse.sh
  participant Helper as supervisor-helper.sh
  participant Assess as assess-task.sh
  participant DB as Database
  participant Worker as WorkerProcess
  participant GH as GitHub
  participant AI as AI/Model

  Pulse->>Helper: trigger evaluation
  Helper->>Assess: assess_task_with_metadata(task_id)
  Assess->>DB: read task state & metadata
  Assess->>Worker: check PID / logs
  Assess->>GH: query PR/merge status (if PR URL or branch)
  alt deterministic fast-path
    Assess-->>Helper: return verdict (complete/retry/failed)
  else use AI
    Assess->>AI: send single-line prompt with gathered evidence
    AI-->>Assess: single-line verdict
    Assess-->>Helper: return verdict + metadata
  end
  Helper->>Pulse: apply transition based on verdict (uses new running:... transitions)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🛠️ New assessor wakes, the supervisor sings,
Running splinters into five new wings,
Pulse skips old traps, reads the state anew,
AI lends a verdict, concise and true,
Automation dances — tidy, bright, and few. ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: eliminating a race condition in the supervisor pulse between stale-state recovery and worker evaluation phases that caused tasks to get stuck in evaluating state.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/intelligent-pulse

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 34 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 03:58:23 UTC 2026: Code review monitoring started
Sat Feb 21 03:58:23 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 34

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 34
  • VULNERABILITIES: 0

Generated on: Sat Feb 21 03:58:26 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The refactor successfully simplifies the state machine by removing the intermediate evaluating state for standard task completion, which significantly reduces the risk of tasks getting stuck. However, there is a logic flaw in the race condition check within Phase 1 that could still allow it to interfere with tasks being handled by Phase 0.7. Additionally, the use of 2>/dev/null for blanket error suppression in the new DB query and the cleanup trap violates repository rules and style guidelines, as detailed in the specific comments.

# Transition to evaluating
cmd_transition "$tid" "evaluating" 2>>"$SUPERVISOR_LOG" || true
# Skip if another phase already moved this task to a post-running state
if [[ "$current_task_state" != "running" && "$current_task_state" != "dispatched" && "$current_task_state" != "evaluating" ]]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To fully eliminate the race condition, Phase 1 should skip tasks that have already been transitioned to the evaluating state by Phase 0.7. The current logic allows Phase 1 to proceed if the state is evaluating, which contradicts the goal of skipping tasks already handled by other phases (as noted in the PR description that Phase 0.7 now handles legacy evaluating tasks).

Suggested change
if [[ "$current_task_state" != "running" && "$current_task_state" != "dispatched" && "$current_task_state" != "evaluating" ]]; then
if [[ "$current_task_state" != "running" && "$current_task_state" != "dispatched" ]]; then

# the task stays in running/dispatched and the next pulse re-evaluates it.
# shellcheck disable=SC2064 # intentional: expand SUPERVISOR_DIR at definition time
trap "_phase1_cleanup_on_signal; release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' 2>/dev/null || true" TERM INT
trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' 2>/dev/null || true" TERM INT

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid using 2>/dev/null for blanket suppression of errors, even for rm -f. The style guide explicitly restricts its use to log file redirection. rm -f is already silent if the file does not exist.

Suggested change
trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' 2>/dev/null || true" TERM INT
trap "release_pulse_lock; rm -f '${SUPERVISOR_DIR}/MODELS.md.tmp' || true" TERM INT
References
  1. 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. (link)
  2. Avoid using 2>/dev/null for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
  3. In shell scripts with set -e, guard kill commands within cleanup traps with || true to prevent premature termination if the target process has already exited, ensuring subsequent cleanup commands are executed.

# Worker is done - evaluate outcome directly (no intermediate evaluating state)
# Re-read current state fresh from DB to avoid race conditions with Phase 0.7
local current_task_state
current_task_state=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$tid")';" 2>/dev/null || echo "")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid using 2>/dev/null for blanket suppression of database errors. If the query fails due to a locked database or permission issue, this will silently return an empty string, causing the task to be skipped and making debugging difficult.

Suggested change
current_task_state=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$tid")';" 2>/dev/null || echo "")
current_task_state=$(db "$SUPERVISOR_DB" "SELECT status FROM tasks WHERE id = '$(sql_escape "$tid")';" || echo "")
References
  1. 2>/dev/null is acceptable ONLY when redirecting to log files, not blanket suppression. (link)
  2. Avoid using 2>/dev/null for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.agents/scripts/supervisor-helper.sh (1)

53-57: ⚠️ Potential issue | 🟡 Minor

State machine comment is now stale.

The ASCII diagram still shows running -> evaluating -> complete as the primary path. With this PR, the main flow is running -> complete (or pr_review, retrying, blocked, failed). The evaluating state is now only a legacy/crash-recovery path. Consider updating the comment to reflect the new primary flow and note evaluating as legacy.

Similarly, the show_usage block at lines 372–376 describes the same outdated flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor-helper.sh around lines 53 - 57, Update the stale
ASCII state-machine comment and the show_usage description to reflect the new
primary flow: change the main path to queued -> dispatched -> running ->
complete (with alternate transitions from running to pr_review, retrying,
blocked, or failed) and mark evaluating as a legacy/crash-recovery-only state;
also update the show_usage block that currently documents the old running ->
evaluating -> complete path to mirror this new primary flow and explicitly note
that evaluating is only used for legacy/crash recovery.
.agents/scripts/supervisor/pulse.sh (1)

2767-2767: ⚠️ Potential issue | 🟡 Minor

Stale reference to Phase 0.8 in log message.

This log message still references "Phase 0.7/0.8/4b" but Phase 0.8 no longer exists. Update to reflect the current phase structure.

Suggested fix
-		log_warn "  Stale state recovery (Phase 0.7/0.8/4b) should clean these up next pulse"
+		log_warn "  Stale state recovery (Phase 0.7/4b) should clean these up next pulse"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/pulse.sh at line 2767, Update the log message in
the log_warn call that currently contains "Stale state recovery (Phase
0.7/0.8/4b) should clean these up next pulse" to remove the obsolete "0.8"
reference (e.g., change to "Stale state recovery (Phase 0.7/4b) should clean
these up next pulse"); locate the call to log_warn with that exact string and
replace the message text accordingly so it reflects the current phase structure.
🧹 Nitpick comments (1)
.agents/scripts/supervisor/pulse.sh (1)

1188-1196: Remove dead code variables _phase1_evaluating_tid and _phase1_pre_eval_state.

These variables are assigned and cleared throughout the loop but never read. The comment on line 1188 states "for logging on pulse kill," but the TERM/INT trap handler (line 1196) only releases the lock and cleans up a temp file—it doesn't reference either variable. Remove the declarations and all corresponding assignment/clear operations to reduce confusion and keep the codebase clean.

♻️ Suggested cleanup
-	# Track which task is being evaluated (for logging on pulse kill)
-	local _phase1_evaluating_tid=""
-	local _phase1_pre_eval_state=""

And remove all corresponding assignments and clears throughout the loop (lines 1228–1229, 1306–1307, 1334–1335, 1422–1423, 1438–1439, 1567–1568).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/pulse.sh around lines 1188 - 1196, Remove the
dead variables _phase1_evaluating_tid and _phase1_pre_eval_state: delete their
initial declarations and remove every assignment/clear referencing them inside
the loop (all places that set or unset _phase1_evaluating_tid and
_phase1_pre_eval_state). Leave the existing trap and cleanup logic (the trap
that calls release_pulse_lock and removes '${SUPERVISOR_DIR}/MODELS.md.tmp')
untouched; do not introduce any references to these variables in the TERM/INT
handler. Ensure no other code paths read those symbols after removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In @.agents/scripts/supervisor-helper.sh:
- Around line 53-57: Update the stale ASCII state-machine comment and the
show_usage description to reflect the new primary flow: change the main path to
queued -> dispatched -> running -> complete (with alternate transitions from
running to pr_review, retrying, blocked, or failed) and mark evaluating as a
legacy/crash-recovery-only state; also update the show_usage block that
currently documents the old running -> evaluating -> complete path to mirror
this new primary flow and explicitly note that evaluating is only used for
legacy/crash recovery.

In @.agents/scripts/supervisor/pulse.sh:
- Line 2767: Update the log message in the log_warn call that currently contains
"Stale state recovery (Phase 0.7/0.8/4b) should clean these up next pulse" to
remove the obsolete "0.8" reference (e.g., change to "Stale state recovery
(Phase 0.7/4b) should clean these up next pulse"); locate the call to log_warn
with that exact string and replace the message text accordingly so it reflects
the current phase structure.

---

Nitpick comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 1188-1196: Remove the dead variables _phase1_evaluating_tid and
_phase1_pre_eval_state: delete their initial declarations and remove every
assignment/clear referencing them inside the loop (all places that set or unset
_phase1_evaluating_tid and _phase1_pre_eval_state). Leave the existing trap and
cleanup logic (the trap that calls release_pulse_lock and removes
'${SUPERVISOR_DIR}/MODELS.md.tmp') untouched; do not introduce any references to
these variables in the TERM/INT handler. Ensure no other code paths read those
symbols after removal.

assess-task.sh gathers evidence from real sources of truth (worker log,
TODO.md, GitHub PR state, process table) instead of trusting the DB cache.
Deterministic fast paths handle clear-cut cases (FULL_LOOP_COMPLETE + PR,
VERIFY_COMPLETE, PR merged). Ambiguous cases fall back to haiku AI call
(~$0.001/call vs $0.50-2.00/worker).

Wired into pulse.sh Phase 1 as preferred evaluator, with fallback chain:
assess_task -> evaluate_worker_with_metadata -> evaluate_worker.

Tested against 5 running tasks — all correctly assessed as complete with
their PR URLs.
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 34 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 04:15:33 UTC 2026: Code review monitoring started
Sat Feb 21 04:15:34 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 34

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 34
  • VULNERABILITIES: 0

Generated on: Sat Feb 21 04:15:37 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Three bugs found during live testing:
1. gh CLI calls in gather_task_evidence consumed stdin from the
   while-read loop, causing Phase 1 to process only the first task.
   Fix: redirect stdin to /dev/null for all gh/opencode/claude calls.
2. record_evaluation_metadata leaked stdout into the outcome variable
   via command substitution. Fix: redirect stdout to /dev/null.
3. set -e (errexit) could abort the Phase 1 loop if any single task
   evaluation failed. Fix: set +e around evaluation, set -e after.

Also adds Phase 1 task count logging for observability.

Tested: all 5 running tasks correctly assessed and transitioned in a
single pulse cycle.
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 34 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 21 04:54:28 UTC 2026: Code review monitoring started
Sat Feb 21 04:54:28 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 34

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 34
  • VULNERABILITIES: 0

Generated on: Sat Feb 21 04:54:31 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

@marcusquinn marcusquinn merged commit b8a9f29 into main Feb 21, 2026
10 of 11 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/supervisor-helper.sh:
- Around line 252-256: The documentation/usage header in
.agents/scripts/supervisor-helper.sh is out of sync with the VALID_TRANSITIONS
array: update the help/usage string that currently mentions the
running→evaluating transition to list the new direct running→complete,
running→pr_review, running→retrying, running→queued, and running→blocked
transitions so the header matches VALID_TRANSITIONS; locate and edit the
usage/help text or header near the VALID_TRANSITIONS declaration and replace the
old "running→evaluating" wording with the full set of
running→complete/pr_review/retrying/queued/blocked transitions.

In @.agents/scripts/supervisor/assess-task.sh:
- Around line 12-13: The script assess-task.sh currently only accepts verdicts
"complete|retry|failed" so exhausted-retry cases get marked failed; update all
validation/checks and rule mappings in assess-task.sh to include a new "blocked"
outcome (add "blocked" to any regex or list of allowed verdicts, to help
text/usage strings, and to the branch handling that maps exhausted retries to
decisions), and change the logic that previously forced exhausted retries to
"failed" to return "blocked" instead; specifically update the verdict validation
and parsing code and the rules handling around the blocks referenced (current
checks around lines 12–13, 240–251, and 277–280) so the parser, validators, and
downstream decision branches accept and propagate "blocked".
- Around line 86-87: The log_tail pipeline mistakenly uses "tail -5" which
limits extracted text to 5 lines; update the pipeline that sets log_tail (the
variable named log_tail in assess-task.sh) to use "tail -50" instead of "tail
-5" so it captures the last 50 text entries as the comment states, keeping the
rest of the pipeline (sed and the fallback tail -50 "$db_log") unchanged.
- Around line 101-108: The grep used to find the todo line can match substrings
(e.g., "t12" inside "t120") and treats dots as regex wildcards; change the
lookup to match the exact task ID literally and anchored as a token: escape
regex metacharacters in $task_id (create e.g. escaped_task_id by replacing . * ?
[ ] \( \) { } + ^ $ | \ with backslashed versions), then use grep -m1 -E with
word/token boundaries such as grep -m1 -E
"(^|[^[:alnum:]_.-])$escaped_task_id($|[^[:alnum:]_.-])" --file "$todo_file" to
set todo_line, and keep the existing todo_state logic that checks "$todo_line"
for "[x]" or "[X]". Ensure you update references to
task_id/todo_line/todo_file/todo_state in that block.

In @.agents/scripts/supervisor/pulse.sh:
- Around line 1231-1237: The loop temporarily disables errexit with "set +e"
around the evaluation block (see _phase1_pre_eval_state, _phase1_evaluating_tid
and the evaluation call) but several branches use "continue" while errexit
remains disabled, leaving the rest of the pulse non-strict; restore strict mode
by re-enabling "set -e" immediately after the evaluation returns (or run the
evaluation in a subshell so effects are local), and ensure any early "continue"
paths restore "set -e" before continuing so each iteration returns to strict
error handling.

Comment on lines +252 to +256
"running:complete"
"running:pr_review"
"running:retrying"
"running:queued"
"running:blocked"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sync state-machine docs with direct running → terminal transitions

VALID_TRANSITIONS now allows direct running→complete/pr_review/retrying/queued/blocked, but the header/usage still describes running→evaluating. Updating the docs will prevent operator confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor-helper.sh around lines 252 - 256, The
documentation/usage header in .agents/scripts/supervisor-helper.sh is out of
sync with the VALID_TRANSITIONS array: update the help/usage string that
currently mentions the running→evaluating transition to list the new direct
running→complete, running→pr_review, running→retrying, running→queued, and
running→blocked transitions so the header matches VALID_TRANSITIONS; locate and
edit the usage/help text or header near the VALID_TRANSITIONS declaration and
replace the old "running→evaluating" wording with the full set of
running→complete/pr_review/retrying/queued/blocked transitions.

Comment on lines +12 to +13
# Returns a structured verdict: complete:<pr_url>, retry:<reason>, failed:<reason>
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add blocked outcome to align with max‑retry policy

The AI prompt/parser only allow complete|retry|failed, so exhausted‑retry cases will be forced to failed, bypassing the supervisor policy of blocking for human intervention. Add blocked to the allowed verdicts and update the rules accordingly.

🛠️ Suggested fix
-# Returns a structured verdict: complete:<pr_url>, retry:<reason>, failed:<reason>
+# Returns a structured verdict: complete:<pr_url>, retry:<reason>, blocked:<reason>, failed:<reason>
@@
-You are a task supervisor. Assess this task and return EXACTLY ONE LINE in this format:
-  complete:<pr_url_or_verified_complete>
-  retry:<brief_reason>
-  failed:<brief_reason>
+You are a task supervisor. Assess this task and return EXACTLY ONE LINE in this format:
+  complete:<pr_url_or_verified_complete>
+  retry:<brief_reason>
+  blocked:<brief_reason>
+  failed:<brief_reason>
@@
-- If the worker failed and retries exhausted → failed:<reason>
+- If the worker failed and retries exhausted → blocked:<reason>
@@
-verdict=$(echo "$ai_result" | grep -oE '^(complete|retry|failed):.*' | head -1 || echo "")
+verdict=$(echo "$ai_result" | grep -oE '^(complete|retry|failed|blocked):.*' | head -1 || echo "")

Also applies to: 240-251, 277-280

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/assess-task.sh around lines 12 - 13, The script
assess-task.sh currently only accepts verdicts "complete|retry|failed" so
exhausted-retry cases get marked failed; update all validation/checks and rule
mappings in assess-task.sh to include a new "blocked" outcome (add "blocked" to
any regex or list of allowed verdicts, to help text/usage strings, and to the
branch handling that maps exhausted retries to decisions), and change the logic
that previously forced exhausted retries to "failed" to return "blocked"
instead; specifically update the verdict validation and parsing code and the
rules handling around the blocks referenced (current checks around lines 12–13,
240–251, and 277–280) so the parser, validators, and downstream decision
branches accept and propagate "blocked".

Comment on lines +86 to +87
# Get the last 50 lines of actual text content (not JSON metadata)
log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -5 | sed -n 's/.*"text":"\([^"]*\)".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

log_tail only captures 5 lines (comment says 50)

The current pipeline limits the evidence to 5 text lines, which underfeeds the AI. Align it with the stated intent.

🛠️ Suggested fix
-			log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -5 | sed -n 's/.*"text":"\([^"]*\)".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)")
+			log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -50 | sed -n 's/.*"text":"\([^"]*\)".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Get the last 50 lines of actual text content (not JSON metadata)
log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -5 | sed -n 's/.*"text":"\([^"]*\)".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)")
# Get the last 50 lines of actual text content (not JSON metadata)
log_tail=$(grep '"type":"text"' "$db_log" 2>/dev/null | tail -50 | sed -n 's/.*"text":"\([^"]*\)".*/\1/p' 2>/dev/null | tail -50 || tail -50 "$db_log" 2>/dev/null || echo "(could not read log)")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/assess-task.sh around lines 86 - 87, The log_tail
pipeline mistakenly uses "tail -5" which limits extracted text to 5 lines;
update the pipeline that sets log_tail (the variable named log_tail in
assess-task.sh) to use "tail -50" instead of "tail -5" so it captures the last
50 text entries as the comment states, keeping the rest of the pipeline (sed and
the fallback tail -50 "$db_log") unchanged.

Comment on lines +101 to +108
# Get the first occurrence of this task ID
todo_line=$(grep -m1 "$task_id" "$todo_file" 2>/dev/null || echo "")
if [[ -n "$todo_line" ]]; then
if [[ "$todo_line" == *"[x]"* || "$todo_line" == *"[X]"* ]]; then
todo_state="checked"
else
todo_state="open"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Anchor TODO matching to the exact task ID

grep -m1 "$task_id" can match t12 inside t120, and dots in subtask IDs are treated as regex wildcards. That can flip TODO state and mislead assessment. Use an anchored, escaped pattern.

🛠️ Suggested fix
-			todo_line=$(grep -m1 "$task_id" "$todo_file" 2>/dev/null || echo "")
+			local task_id_re
+			task_id_re=$(printf '%s' "$task_id" | sed -E 's/[][\\.^$*+?{}|()]/\\&/g')
+			todo_line=$(grep -m1 -E "^[[:space:]]*- \\[[ xX-]\\] ${task_id_re}([[:space:]]|$)" "$todo_file" 2>/dev/null || echo "")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Get the first occurrence of this task ID
todo_line=$(grep -m1 "$task_id" "$todo_file" 2>/dev/null || echo "")
if [[ -n "$todo_line" ]]; then
if [[ "$todo_line" == *"[x]"* || "$todo_line" == *"[X]"* ]]; then
todo_state="checked"
else
todo_state="open"
fi
# Get the first occurrence of this task ID
local task_id_re
task_id_re=$(printf '%s' "$task_id" | sed -E 's/[][\\.^$*+?{}|()]/\\&/g')
todo_line=$(grep -m1 -E "^[[:space:]]*- \\[[ xX-]\\] ${task_id_re}([[:space:]]|$)" "$todo_file" 2>/dev/null || echo "")
if [[ -n "$todo_line" ]]; then
if [[ "$todo_line" == *"[x]"* || "$todo_line" == *"[X]"* ]]; then
todo_state="checked"
else
todo_state="open"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/assess-task.sh around lines 101 - 108, The grep
used to find the todo line can match substrings (e.g., "t12" inside "t120") and
treats dots as regex wildcards; change the lookup to match the exact task ID
literally and anchored as a token: escape regex metacharacters in $task_id
(create e.g. escaped_task_id by replacing . * ? [ ] \( \) { } + ^ $ | \ with
backslashed versions), then use grep -m1 -E with word/token boundaries such as
grep -m1 -E "(^|[^[:alnum:]_.-])$escaped_task_id($|[^[:alnum:]_.-])" --file
"$todo_file" to set todo_line, and keep the existing todo_state logic that
checks "$todo_line" for "[x]" or "[X]". Ensure you update references to
task_id/todo_line/todo_file/todo_state in that block.

Comment on lines +1231 to +1237
log_info " $tid: worker finished, evaluating..."
_phase1_pre_eval_state="$current_task_state"
_phase1_evaluating_tid="$tid"
# Temporarily disable errexit for evaluation — a single task failure
# must not abort evaluation of remaining tasks
set +e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore set -e before any early continue

set +e is enabled for evaluation, but several branches continue (e.g., alive, early completion paths). That leaves errexit disabled for subsequent tasks and the rest of the pulse, weakening fail‑fast behavior. Re‑enable immediately after the eval call (or wrap it) so each iteration returns to strict error handling.

🛠️ Suggested fix
-			# Temporarily disable errexit for evaluation — a single task failure
-			# must not abort evaluation of remaining tasks
-			set +e
+			# Temporarily disable errexit for evaluation — a single task failure
+			# must not abort evaluation of remaining tasks
+			set +e
+			local eval_rc=0
@@
-			if command -v assess_task_with_metadata &>/dev/null; then
-				outcome=$(assess_task_with_metadata "$tid")
+			if command -v assess_task_with_metadata &>/dev/null; then
+				outcome=$(assess_task_with_metadata "$tid"); eval_rc=$?
 				eval_maker="assess_task"
 			elif command -v evaluate_worker_with_metadata &>/dev/null; then
-				outcome=$(evaluate_worker_with_metadata "$tid" "$skip_ai")
+				outcome=$(evaluate_worker_with_metadata "$tid" "$skip_ai"); eval_rc=$?
 				eval_maker="evaluate_worker_with_metadata"
 			else
-				outcome=$(evaluate_worker "$tid" "$skip_ai")
+				outcome=$(evaluate_worker "$tid" "$skip_ai"); eval_rc=$?
 			fi
+			set -e
+			if [[ $eval_rc -ne 0 ]]; then
+				log_warn "  $tid: evaluation failed (rc=$eval_rc) — retrying"
+				outcome="retry:evaluation_error"
+			fi

Also applies to: 1322-1329, 1584-1589

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/pulse.sh around lines 1231 - 1237, The loop
temporarily disables errexit with "set +e" around the evaluation block (see
_phase1_pre_eval_state, _phase1_evaluating_tid and the evaluation call) but
several branches use "continue" while errexit remains disabled, leaving the rest
of the pulse non-strict; restore strict mode by re-enabling "set -e" immediately
after the evaluation returns (or run the evaluation in a subshell so effects are
local), and ensure any early "continue" paths restore "set -e" before continuing
so each iteration returns to strict error handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant