Skip to content

t1258: fix stale evaluating recovery false positives in Phase 0.7#1966

Merged
marcusquinn merged 1 commit intomainfrom
feature/t1258
Feb 19, 2026
Merged

t1258: fix stale evaluating recovery false positives in Phase 0.7#1966
marcusquinn merged 1 commit intomainfrom
feature/t1258

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 19, 2026

Summary

  • Fix check ordering bug in _diagnose_stale_root_cause(): heartbeat check now runs BEFORE log content checks, preventing false worker_failed_before_eval diagnoses on actively-evaluating tasks
  • Fix wrong log file for ai_eval_timeout detection: was searching worker log for supervisor function name (never matched); now searches supervisor log scoped to task ID
  • Fix worker_failed_before_eval false positives from REPROMPT METADATA headers: now only checks last 20 lines of log instead of full log
  • Extend Phase 0.7 PR-routing to all stale states: running tasks with a PR URL now route to pr_review instead of being re-queued (previously only evaluating tasks got this treatment)

Root Cause

15/15 recently completed tasks showed 'Stale evaluating recovery (Phase 0.7)' notes. Investigation found 4 bugs in _diagnose_stale_root_cause() and Phase 0.7 recovery routing that caused tasks to be incorrectly recovered even when evaluation was actively in progress.

Data (last 2 days, 124 stale recovery events)

Root Cause Events Fix Applied
worker_rate_limited 38 (30.6%) rate_limit_until cooldown already handles
eval_process_died 30 (24.2%) Fix #1: heartbeat check first
worker_failed_before_eval 30 (24.2%) Fix #1+#3: heartbeat first + tail-only grep
worker_oom_killed 13 (10.5%) Fix #4: route to pr_review when PR exists
pulse_killed_after_pr_persist 8 (6.5%) Already handled by fast-path
ai_eval_timeout 3 (2.4%) Fix #2: search correct log file

Summary by CodeRabbit

  • Reliability Improvements
    • Enhanced task monitoring to reduce false-positive stale detections when evaluations are actively running
    • Accelerated recovery paths for tasks with open pull requests
    • Improved timeout detection for AI evaluations to prevent task state misclassification

…t1258)

Three bugs causing 100% of tasks to route through Phase 0.7 stale recovery
instead of completing normally:

1. Check ordering bug: worker_failed_before_eval fired BEFORE the heartbeat
   check in _diagnose_stale_root_cause(). Tasks actively evaluating (heartbeat
   fresh) were incorrectly recovered if their log contained WORKER_FAILED from
   a previous retry's REPROMPT METADATA header. Fix: move heartbeat check first.

2. Wrong log file for ai_eval_timeout: the check searched the worker log for
   'evaluate_with_ai' — a supervisor function that never appears in worker logs.
   This check never matched, masking ai_eval_timeout diagnoses. Fix: search the
   supervisor log (SUPERVISOR_LOG) instead, scoped to the task ID.

3. Log tail scope for worker_failed_before_eval: full-log grep caused false
   positives from REPROMPT METADATA headers embedding previous failure content.
   Fix: only check last 20 lines (actual execution status, not embedded history).

4. Phase 0.7 recovery routing: running tasks with a PR URL were re-queued
   unnecessarily (e.g., worker_oom_killed after creating PR). Fix: extend the
   evaluating+PR fast-path to all states — any stale task with a PR routes to
   pr_review regardless of whether it was in running or evaluating state.

Root cause data (last 2 days, 124 events):
  worker_rate_limited:     38 (30.6%) — rate limit cooldown already handles
  eval_process_died:       30 (24.2%) — fix #1 prevents false recoveries
  worker_failed_before_eval: 30 (24.2%) — fix #1+#3 prevent false positives
  worker_oom_killed:       13 (10.5%) — fix #4 routes to pr_review when PR exists
  pulse_killed_after_pr_persist: 8 (6.5%) — already handled by fast-path
  ai_eval_timeout:          3 (2.4%) — fix #2 enables correct diagnosis
@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses several bugs in the stale task recovery mechanism, specifically within _diagnose_stale_root_cause() and Phase 0.7 recovery routing. The changes aim to significantly reduce false positives in stale task diagnoses, ensuring that actively running or evaluating tasks are not incorrectly flagged as failed. Additionally, it refines the routing logic for tasks with existing Pull Requests, streamlining their recovery process.

Highlights

  • Improved Stale Task Diagnosis Logic: Reordered checks in _diagnose_stale_root_cause() to prioritize heartbeat checks, preventing false worker_failed_before_eval diagnoses for actively evaluating tasks.
  • Corrected AI Evaluation Timeout Detection: Fixed ai_eval_timeout detection to search the supervisor log for relevant task IDs, resolving an issue where the worker log was incorrectly searched.
  • Reduced False Positives for Worker Failures: Limited the worker_failed_before_eval log check to the last 20 lines, avoiding false positives caused by REPROMPT METADATA headers containing previous failure content.
  • Expanded PR Routing for Stale Tasks: Extended Phase 0.7 recovery routing to include all stale tasks with an associated PR URL, ensuring they are routed to pr_review instead of being re-queued, improving recovery efficiency.
Changelog
  • .agents/scripts/supervisor/pulse.sh
    • Reordered diagnostic checks to prioritize active evaluation heartbeat over log error patterns.
    • Adjusted worker_failed_before_eval detection to only scan the last 20 lines of the log.
    • Corrected the log file and search pattern for ai_eval_timeout detection to target the supervisor log.
    • Modified PR routing logic to apply to all stale tasks with a PR, not just those in an 'evaluating' state.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 29 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 13:48:17 UTC 2026: Code review monitoring started
Thu Feb 19 13:48:18 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 29

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 29
  • VULNERABILITIES: 0

Generated on: Thu Feb 19 13:48:20 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2026

Walkthrough

This PR enhances stale-task detection logic in the supervisor pulse script by reordering guard conditions to prevent false positives during active AI evaluations, introducing AI-eval timeout detection via supervisor log tails, and expanding PR-based fast-path recovery to any stale task with an existing PR.

Changes

Cohort / File(s) Summary
Supervisor Stale-Detection Logic
.agents/scripts/supervisor/pulse.sh
Reordered guards in _diagnose_stale_root_cause to bypass heartbeat/eval-timeout checks when evaluation is in progress; added AI-eval timeout detection using SUPERVISOR_LOG tails (100 and complementary checks); expanded fast-path recovery to route any stale task with PR flag to pr_review (previously only evaluating tasks); updated state transitions and log messages with t1258 context for PR recovery rationale.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🤖 Guard the gates when eval's in flight,
Spot timeouts burning through the night,
PRs leap fast down recovery's road,
No false alarms—just clear the load! ⚡

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR changes are narrowly scoped to shell script bug fixes for stale recovery detection and do not address the linked issues which focus on CodeRabbit AI integration setup and comprehensive codebase audits. Review the alignment between PR objectives and linked issues. This PR fixes specific stale recovery bugs but doesn't implement CodeRabbit integration or codebase auditing as required by issues #1, #2, and #4.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing false positives in stale evaluating recovery logic during Phase 0.7, with clear reference to the issue number t1258.
Out of Scope Changes check ✅ Passed All changes in pulse.sh are directly related to fixing the stale recovery detection bugs described in the PR objectives, with no extraneous modifications.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t1258

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sonarqubecloud
Copy link

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several crucial fixes to the stale task recovery logic, effectively addressing false positives and improving the routing for tasks that already have an associated pull request. The changes are logical and well-documented. My review includes a couple of suggestions to align the code with the repository's style guide regarding error suppression, which will improve code clarity and maintainability.

# a retry log that includes WORKER_FAILED from the prior attempt).
# Full-log grep caused worker_failed_before_eval false positives on
# tasks that were actively evaluating their second or third attempt.
if tail -20 "$log_file" 2>/dev/null | grep -q 'WORKER_FAILED\|DISPATCH_ERROR\|command not found'; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This use of 2>/dev/null violates the repository style guide, which disallows blanket error suppression. A cleaner approach is to first check if the log file exists before attempting to tail it. This avoids suppressing other potential errors from tail if it fails for reasons other than a missing file.

Suggested change
if tail -20 "$log_file" 2>/dev/null | grep -q 'WORKER_FAILED\|DISPATCH_ERROR\|command not found'; then
if [[ -f "$log_file" ]] && tail -20 "$log_file" | grep -q 'WORKER_FAILED\|DISPATCH_ERROR\|command not found'; then
References
  1. The style guide specifies that 2>/dev/null is only acceptable when redirecting to log files, not for blanket suppression of command errors. (link)

# contains evaluate_with_ai (it's a supervisor function). The previous check
# searched the wrong file and never matched, masking ai_eval_timeout cases.
if [[ -n "${SUPERVISOR_LOG:-}" && -f "$SUPERVISOR_LOG" ]]; then
if tail -100 "$SUPERVISOR_LOG" 2>/dev/null | grep -q "evaluate_with_ai.*${task_id}\|AI eval.*${task_id}"; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Per the repository style guide, 2>/dev/null should not be used for general error suppression. Since the existence of $SUPERVISOR_LOG is already verified in the outer if condition, the 2>/dev/null here is unnecessary and can be removed. This makes the code cleaner and adheres to the guide.

Suggested change
if tail -100 "$SUPERVISOR_LOG" 2>/dev/null | grep -q "evaluate_with_ai.*${task_id}\|AI eval.*${task_id}"; then
if tail -100 "$SUPERVISOR_LOG" | grep -q "evaluate_with_ai.*${task_id}\|AI eval.*${task_id}"; then
References
  1. The style guide specifies that 2>/dev/null is only acceptable when redirecting to log files, not for blanket suppression of command errors. (link)

coderabbitai[bot]
coderabbitai bot previously requested changes Feb 19, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.agents/scripts/supervisor/pulse.sh (1)

250-258: Correct fix: searching SUPERVISOR_LOG for evaluate_with_ai instead of the worker log.

The guard on line 253 (-n "${SUPERVISOR_LOG:-}" && -f "$SUPERVISOR_LOG") is properly defensive. The tail -100 scoping is reasonable for recent activity.

One concern: task IDs containing dots (e.g., subtask t123.4) will have the dot treated as a regex wildcard in the grep pattern on line 254. This means t123.4 would also match t12304, t123X4, etc. While false matches are unlikely in practice (these patterns rarely coexist in the same 100 lines), you could tighten this with grep -F for the task_id portion or escape dots.

🛡️ Optional: escape dots in task_id for grep
+		local escaped_task_id="${task_id//./\\.}"
 		if [[ -n "${SUPERVISOR_LOG:-}" && -f "$SUPERVISOR_LOG" ]]; then
-			if tail -100 "$SUPERVISOR_LOG" 2>/dev/null | grep -q "evaluate_with_ai.*${task_id}\|AI eval.*${task_id}"; then
+			if tail -100 "$SUPERVISOR_LOG" 2>/dev/null | grep -q "evaluate_with_ai.*${escaped_task_id}\|AI eval.*${escaped_task_id}"; then
 				echo "ai_eval_timeout"
 				return 0
 			fi
 		fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/pulse.sh around lines 250 - 258, The grep pattern
that searches SUPERVISOR_LOG uses task_id directly, so dots in task IDs (e.g.,
t123.4) act as regex wildcards; update the check in the block that references
SUPERVISOR_LOG and tail -100 so the task_id is treated literally—either escape
regex metacharacters in the task_id before embedding it into the pattern or
switch to a fixed-string search for the task_id portion (e.g., use grep -F or
construct two fixed-string checks) while still matching the surrounding
evaluate_with_ai / AI eval context; ensure you change the code that builds the
grep pattern (the line containing evaluate_with_ai.*${task_id}\|AI
eval.*${task_id}) so it no longer treats dots as wildcards.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 1013-1029: The had_pr_flag computation needs to treat the sentinel
value 'task_obsolete' as not having a PR so the pr_review fast-path stays
consistent with other checks; update the logic that sets had_pr_flag (the same
condition used before the block that references stale_pr_url and calls
cmd_transition "$stale_id" "pr_review") to exclude cases where stale_pr_url ==
'task_obsolete' (i.e., only set had_pr_flag when stale_pr_url is non-empty and
not equal to 'task_obsolete'), and keep the same exclusion when composing the
state_log message and calling cmd_transition so we never route a task with the
sentinel pr_url to pr_review.

---

Nitpick comments:
In @.agents/scripts/supervisor/pulse.sh:
- Around line 250-258: The grep pattern that searches SUPERVISOR_LOG uses
task_id directly, so dots in task IDs (e.g., t123.4) act as regex wildcards;
update the check in the block that references SUPERVISOR_LOG and tail -100 so
the task_id is treated literally—either escape regex metacharacters in the
task_id before embedding it into the pattern or switch to a fixed-string search
for the task_id portion (e.g., use grep -F or construct two fixed-string checks)
while still matching the surrounding evaluate_with_ai / AI eval context; ensure
you change the code that builds the grep pattern (the line containing
evaluate_with_ai.*${task_id}\|AI eval.*${task_id}) so it no longer treats dots
as wildcards.

Comment on lines +1013 to 1029
# t1145/t1250/t1258: If the stale task has a PR, route to pr_review instead of
# re-queuing — the work is done, only evaluation (or the worker) died.
# Applies to both 'evaluating' and 'running' states with a PR URL.
# Fast-path evaluating tasks (with PR URL) arrive here with a 10s grace instead
# of 120s, reducing the median recovery latency from ~120s to ~10s.
# Running tasks with a PR (e.g., worker_oom_killed after creating PR) previously
# went to queued unnecessarily — they should go directly to pr_review (t1258).
if [[ "$had_pr_flag" -eq 1 ]]; then
recovery_to_state="pr_review"
log_info " Phase 0.7: $stale_id → pr_review (has PR, evaluation process died)"
cmd_transition "$stale_id" "pr_review" --pr-url "$stale_pr_url" --error "Stale evaluating recovery (Phase 0.7/t1145/t1250): evaluation process died, PR exists (cause: $root_cause)" 2>>"$SUPERVISOR_LOG" || true
log_info " Phase 0.7: $stale_id → pr_review (has PR, $stale_status process died, cause: $root_cause)"
cmd_transition "$stale_id" "pr_review" --pr-url "$stale_pr_url" --error "Stale $stale_status recovery (Phase 0.7/t1145/t1250/t1258): process died, PR exists (cause: $root_cause)" 2>>"$SUPERVISOR_LOG" || true
db "$SUPERVISOR_DB" "
INSERT INTO state_log (task_id, from_state, to_state, reason)
VALUES ('$(sql_escape "$stale_id")', '$(sql_escape "$stale_status")',
'pr_review',
'Phase 0.7 stale-state recovery (t1145/t1202/t1250): evaluating with dead worker but PR exists — routed to pr_review (cause: $(sql_escape "$root_cause"))');
'Phase 0.7 stale-state recovery (t1145/t1202/t1250/t1258): $stale_status with dead worker but PR exists — routed to pr_review (cause: $(sql_escape "$root_cause"))');
" 2>/dev/null || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and find the relevant code
cd .agents/scripts/supervisor && wc -l pulse.sh

Repository: marcusquinn/aidevops

Length of output: 77


🏁 Script executed:

# Read around line 981 to see the had_pr_flag computation
sed -n '975,990p' .agents/scripts/supervisor/pulse.sh

Repository: marcusquinn/aidevops

Length of output: 1244


🏁 Script executed:

# Find all occurrences of had_pr_flag to understand its usage
rg -n 'had_pr_flag' .agents/scripts/supervisor/pulse.sh

Repository: marcusquinn/aidevops

Length of output: 497


🏁 Script executed:

# Find all occurrences of task_obsolete to understand sentinel handling
rg -n 'task_obsolete' .agents/scripts/supervisor/pulse.sh -B2 -A2

Repository: marcusquinn/aidevops

Length of output: 1971


🏁 Script executed:

# Check if task_obsolete appears in any PR URL checks
rg -n 'stale_pr_url.*task_obsolete|task_obsolete.*stale_pr_url' .agents/scripts/supervisor/pulse.sh

Repository: marcusquinn/aidevops

Length of output: 46


Add task_obsolete sentinel check to had_pr_flag computation for consistency.

The broadened PR fast-path is correct, but line 981's had_pr_flag computation is inconsistent with how task_obsolete is handled elsewhere in the script.

Other PR URL checks consistently exclude task_obsolete:

  • Fast-path evaluating query excludes it (line 867)
  • Early PR persist check excludes it (line 1310)
  • Task completion query excludes it (line 1966)

Without this exclusion, a stale running/dispatched task with pr_url = 'task_obsolete' would route to pr_review with the sentinel value, creating downstream inconsistency.

🛡️ Fix: exclude task_obsolete from had_pr_flag
 			local had_pr_flag=0
-			[[ -n "$stale_pr_url" && "$stale_pr_url" != "no_pr" && "$stale_pr_url" != "task_only" ]] && had_pr_flag=1
+			[[ -n "$stale_pr_url" && "$stale_pr_url" != "no_pr" && "$stale_pr_url" != "task_only" && "$stale_pr_url" != "task_obsolete" ]] && had_pr_flag=1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/supervisor/pulse.sh around lines 1013 - 1029, The
had_pr_flag computation needs to treat the sentinel value 'task_obsolete' as not
having a PR so the pr_review fast-path stays consistent with other checks;
update the logic that sets had_pr_flag (the same condition used before the block
that references stale_pr_url and calls cmd_transition "$stale_id" "pr_review")
to exclude cases where stale_pr_url == 'task_obsolete' (i.e., only set
had_pr_flag when stale_pr_url is non-empty and not equal to 'task_obsolete'),
and keep the same exclusion when composing the state_log message and calling
cmd_transition so we never route a task with the sentinel pr_url to pr_review.

@marcusquinn marcusquinn dismissed coderabbitai[bot]’s stale review February 19, 2026 14:06

Auto-dismissed: bot review does not block autonomous pipeline

@marcusquinn marcusquinn merged commit 4ac6e06 into main Feb 19, 2026
23 checks passed
@marcusquinn marcusquinn deleted the feature/t1258 branch February 19, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant