t1258: fix stale evaluating recovery false positives in Phase 0.7#1966
t1258: fix stale evaluating recovery false positives in Phase 0.7#1966marcusquinn merged 1 commit intomainfrom
Conversation
…t1258) Three bugs causing 100% of tasks to route through Phase 0.7 stale recovery instead of completing normally: 1. Check ordering bug: worker_failed_before_eval fired BEFORE the heartbeat check in _diagnose_stale_root_cause(). Tasks actively evaluating (heartbeat fresh) were incorrectly recovered if their log contained WORKER_FAILED from a previous retry's REPROMPT METADATA header. Fix: move heartbeat check first. 2. Wrong log file for ai_eval_timeout: the check searched the worker log for 'evaluate_with_ai' — a supervisor function that never appears in worker logs. This check never matched, masking ai_eval_timeout diagnoses. Fix: search the supervisor log (SUPERVISOR_LOG) instead, scoped to the task ID. 3. Log tail scope for worker_failed_before_eval: full-log grep caused false positives from REPROMPT METADATA headers embedding previous failure content. Fix: only check last 20 lines (actual execution status, not embedded history). 4. Phase 0.7 recovery routing: running tasks with a PR URL were re-queued unnecessarily (e.g., worker_oom_killed after creating PR). Fix: extend the evaluating+PR fast-path to all states — any stale task with a PR routes to pr_review regardless of whether it was in running or evaluating state. Root cause data (last 2 days, 124 events): worker_rate_limited: 38 (30.6%) — rate limit cooldown already handles eval_process_died: 30 (24.2%) — fix #1 prevents false recoveries worker_failed_before_eval: 30 (24.2%) — fix #1+#3 prevent false positives worker_oom_killed: 13 (10.5%) — fix #4 routes to pr_review when PR exists pulse_killed_after_pr_persist: 8 (6.5%) — already handled by fast-path ai_eval_timeout: 3 (2.4%) — fix #2 enables correct diagnosis
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses several bugs in the stale task recovery mechanism, specifically within Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 19 13:48:20 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
WalkthroughThis PR enhances stale-task detection logic in the supervisor pulse script by reordering guard conditions to prevent false positives during active AI evaluations, introducing AI-eval timeout detection via supervisor log tails, and expanding PR-based fast-path recovery to any stale task with an existing PR. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Code Review
This pull request introduces several crucial fixes to the stale task recovery logic, effectively addressing false positives and improving the routing for tasks that already have an associated pull request. The changes are logical and well-documented. My review includes a couple of suggestions to align the code with the repository's style guide regarding error suppression, which will improve code clarity and maintainability.
| # a retry log that includes WORKER_FAILED from the prior attempt). | ||
| # Full-log grep caused worker_failed_before_eval false positives on | ||
| # tasks that were actively evaluating their second or third attempt. | ||
| if tail -20 "$log_file" 2>/dev/null | grep -q 'WORKER_FAILED\|DISPATCH_ERROR\|command not found'; then |
There was a problem hiding this comment.
This use of 2>/dev/null violates the repository style guide, which disallows blanket error suppression. A cleaner approach is to first check if the log file exists before attempting to tail it. This avoids suppressing other potential errors from tail if it fails for reasons other than a missing file.
| if tail -20 "$log_file" 2>/dev/null | grep -q 'WORKER_FAILED\|DISPATCH_ERROR\|command not found'; then | |
| if [[ -f "$log_file" ]] && tail -20 "$log_file" | grep -q 'WORKER_FAILED\|DISPATCH_ERROR\|command not found'; then |
References
- The style guide specifies that
2>/dev/nullis only acceptable when redirecting to log files, not for blanket suppression of command errors. (link)
| # contains evaluate_with_ai (it's a supervisor function). The previous check | ||
| # searched the wrong file and never matched, masking ai_eval_timeout cases. | ||
| if [[ -n "${SUPERVISOR_LOG:-}" && -f "$SUPERVISOR_LOG" ]]; then | ||
| if tail -100 "$SUPERVISOR_LOG" 2>/dev/null | grep -q "evaluate_with_ai.*${task_id}\|AI eval.*${task_id}"; then |
There was a problem hiding this comment.
Per the repository style guide, 2>/dev/null should not be used for general error suppression. Since the existence of $SUPERVISOR_LOG is already verified in the outer if condition, the 2>/dev/null here is unnecessary and can be removed. This makes the code cleaner and adheres to the guide.
| if tail -100 "$SUPERVISOR_LOG" 2>/dev/null | grep -q "evaluate_with_ai.*${task_id}\|AI eval.*${task_id}"; then | |
| if tail -100 "$SUPERVISOR_LOG" | grep -q "evaluate_with_ai.*${task_id}\|AI eval.*${task_id}"; then |
References
- The style guide specifies that
2>/dev/nullis only acceptable when redirecting to log files, not for blanket suppression of command errors. (link)
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.agents/scripts/supervisor/pulse.sh (1)
250-258: Correct fix: searching SUPERVISOR_LOG forevaluate_with_aiinstead of the worker log.The guard on line 253 (
-n "${SUPERVISOR_LOG:-}" && -f "$SUPERVISOR_LOG") is properly defensive. Thetail -100scoping is reasonable for recent activity.One concern: task IDs containing dots (e.g., subtask
t123.4) will have the dot treated as a regex wildcard in thegreppattern on line 254. This meanst123.4would also matcht12304,t123X4, etc. While false matches are unlikely in practice (these patterns rarely coexist in the same 100 lines), you could tighten this withgrep -Ffor the task_id portion or escape dots.🛡️ Optional: escape dots in task_id for grep
+ local escaped_task_id="${task_id//./\\.}" if [[ -n "${SUPERVISOR_LOG:-}" && -f "$SUPERVISOR_LOG" ]]; then - if tail -100 "$SUPERVISOR_LOG" 2>/dev/null | grep -q "evaluate_with_ai.*${task_id}\|AI eval.*${task_id}"; then + if tail -100 "$SUPERVISOR_LOG" 2>/dev/null | grep -q "evaluate_with_ai.*${escaped_task_id}\|AI eval.*${escaped_task_id}"; then echo "ai_eval_timeout" return 0 fi fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/pulse.sh around lines 250 - 258, The grep pattern that searches SUPERVISOR_LOG uses task_id directly, so dots in task IDs (e.g., t123.4) act as regex wildcards; update the check in the block that references SUPERVISOR_LOG and tail -100 so the task_id is treated literally—either escape regex metacharacters in the task_id before embedding it into the pattern or switch to a fixed-string search for the task_id portion (e.g., use grep -F or construct two fixed-string checks) while still matching the surrounding evaluate_with_ai / AI eval context; ensure you change the code that builds the grep pattern (the line containing evaluate_with_ai.*${task_id}\|AI eval.*${task_id}) so it no longer treats dots as wildcards.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 1013-1029: The had_pr_flag computation needs to treat the sentinel
value 'task_obsolete' as not having a PR so the pr_review fast-path stays
consistent with other checks; update the logic that sets had_pr_flag (the same
condition used before the block that references stale_pr_url and calls
cmd_transition "$stale_id" "pr_review") to exclude cases where stale_pr_url ==
'task_obsolete' (i.e., only set had_pr_flag when stale_pr_url is non-empty and
not equal to 'task_obsolete'), and keep the same exclusion when composing the
state_log message and calling cmd_transition so we never route a task with the
sentinel pr_url to pr_review.
---
Nitpick comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 250-258: The grep pattern that searches SUPERVISOR_LOG uses
task_id directly, so dots in task IDs (e.g., t123.4) act as regex wildcards;
update the check in the block that references SUPERVISOR_LOG and tail -100 so
the task_id is treated literally—either escape regex metacharacters in the
task_id before embedding it into the pattern or switch to a fixed-string search
for the task_id portion (e.g., use grep -F or construct two fixed-string checks)
while still matching the surrounding evaluate_with_ai / AI eval context; ensure
you change the code that builds the grep pattern (the line containing
evaluate_with_ai.*${task_id}\|AI eval.*${task_id}) so it no longer treats dots
as wildcards.
| # t1145/t1250/t1258: If the stale task has a PR, route to pr_review instead of | ||
| # re-queuing — the work is done, only evaluation (or the worker) died. | ||
| # Applies to both 'evaluating' and 'running' states with a PR URL. | ||
| # Fast-path evaluating tasks (with PR URL) arrive here with a 10s grace instead | ||
| # of 120s, reducing the median recovery latency from ~120s to ~10s. | ||
| # Running tasks with a PR (e.g., worker_oom_killed after creating PR) previously | ||
| # went to queued unnecessarily — they should go directly to pr_review (t1258). | ||
| if [[ "$had_pr_flag" -eq 1 ]]; then | ||
| recovery_to_state="pr_review" | ||
| log_info " Phase 0.7: $stale_id → pr_review (has PR, evaluation process died)" | ||
| cmd_transition "$stale_id" "pr_review" --pr-url "$stale_pr_url" --error "Stale evaluating recovery (Phase 0.7/t1145/t1250): evaluation process died, PR exists (cause: $root_cause)" 2>>"$SUPERVISOR_LOG" || true | ||
| log_info " Phase 0.7: $stale_id → pr_review (has PR, $stale_status process died, cause: $root_cause)" | ||
| cmd_transition "$stale_id" "pr_review" --pr-url "$stale_pr_url" --error "Stale $stale_status recovery (Phase 0.7/t1145/t1250/t1258): process died, PR exists (cause: $root_cause)" 2>>"$SUPERVISOR_LOG" || true | ||
| db "$SUPERVISOR_DB" " | ||
| INSERT INTO state_log (task_id, from_state, to_state, reason) | ||
| VALUES ('$(sql_escape "$stale_id")', '$(sql_escape "$stale_status")', | ||
| 'pr_review', | ||
| 'Phase 0.7 stale-state recovery (t1145/t1202/t1250): evaluating with dead worker but PR exists — routed to pr_review (cause: $(sql_escape "$root_cause"))'); | ||
| 'Phase 0.7 stale-state recovery (t1145/t1202/t1250/t1258): $stale_status with dead worker but PR exists — routed to pr_review (cause: $(sql_escape "$root_cause"))'); | ||
| " 2>/dev/null || true |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the file structure and find the relevant code
cd .agents/scripts/supervisor && wc -l pulse.shRepository: marcusquinn/aidevops
Length of output: 77
🏁 Script executed:
# Read around line 981 to see the had_pr_flag computation
sed -n '975,990p' .agents/scripts/supervisor/pulse.shRepository: marcusquinn/aidevops
Length of output: 1244
🏁 Script executed:
# Find all occurrences of had_pr_flag to understand its usage
rg -n 'had_pr_flag' .agents/scripts/supervisor/pulse.shRepository: marcusquinn/aidevops
Length of output: 497
🏁 Script executed:
# Find all occurrences of task_obsolete to understand sentinel handling
rg -n 'task_obsolete' .agents/scripts/supervisor/pulse.sh -B2 -A2Repository: marcusquinn/aidevops
Length of output: 1971
🏁 Script executed:
# Check if task_obsolete appears in any PR URL checks
rg -n 'stale_pr_url.*task_obsolete|task_obsolete.*stale_pr_url' .agents/scripts/supervisor/pulse.shRepository: marcusquinn/aidevops
Length of output: 46
Add task_obsolete sentinel check to had_pr_flag computation for consistency.
The broadened PR fast-path is correct, but line 981's had_pr_flag computation is inconsistent with how task_obsolete is handled elsewhere in the script.
Other PR URL checks consistently exclude task_obsolete:
- Fast-path evaluating query excludes it (line 867)
- Early PR persist check excludes it (line 1310)
- Task completion query excludes it (line 1966)
Without this exclusion, a stale running/dispatched task with pr_url = 'task_obsolete' would route to pr_review with the sentinel value, creating downstream inconsistency.
🛡️ Fix: exclude task_obsolete from had_pr_flag
local had_pr_flag=0
- [[ -n "$stale_pr_url" && "$stale_pr_url" != "no_pr" && "$stale_pr_url" != "task_only" ]] && had_pr_flag=1
+ [[ -n "$stale_pr_url" && "$stale_pr_url" != "no_pr" && "$stale_pr_url" != "task_only" && "$stale_pr_url" != "task_obsolete" ]] && had_pr_flag=1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/supervisor/pulse.sh around lines 1013 - 1029, The
had_pr_flag computation needs to treat the sentinel value 'task_obsolete' as not
having a PR so the pr_review fast-path stays consistent with other checks;
update the logic that sets had_pr_flag (the same condition used before the block
that references stale_pr_url and calls cmd_transition "$stale_id" "pr_review")
to exclude cases where stale_pr_url == 'task_obsolete' (i.e., only set
had_pr_flag when stale_pr_url is non-empty and not equal to 'task_obsolete'),
and keep the same exclusion when composing the state_log message and calling
cmd_transition so we never route a task with the sentinel pr_url to pr_review.
Auto-dismissed: bot review does not block autonomous pipeline



Summary
_diagnose_stale_root_cause(): heartbeat check now runs BEFORE log content checks, preventing falseworker_failed_before_evaldiagnoses on actively-evaluating tasksai_eval_timeoutdetection: was searching worker log for supervisor function name (never matched); now searches supervisor log scoped to task IDworker_failed_before_evalfalse positives from REPROMPT METADATA headers: now only checks last 20 lines of log instead of full logpr_reviewinstead of being re-queued (previously only evaluating tasks got this treatment)Root Cause
15/15 recently completed tasks showed 'Stale evaluating recovery (Phase 0.7)' notes. Investigation found 4 bugs in
_diagnose_stale_root_cause()and Phase 0.7 recovery routing that caused tasks to be incorrectly recovered even when evaluation was actively in progress.Data (last 2 days, 124 stale recovery events)
Summary by CodeRabbit