fix: resolve three supervisor pulse-blocking bugs by marcusquinn · Pull Request #2185 · marcusquinn/aidevops

marcusquinn · 2026-02-23T01:34:42Z

Summary

Fixes three bugs that caused the supervisor pulse to die before reaching Phase 3+ (PR lifecycle, health issue update), creating a loop where every 2-minute pulse would crash at the same point.

Bugs Fixed

1. Dedup guard invalid state transition (dispatch.sh)

The dedup guard tried queued -> blocked which is not in VALID_TRANSITIONS. The transition silently failed, but downstream effects (TODO.md commit, mail send) caused the pulse to exit under set -e. Changed to queued -> cancelled which is a valid transition.

2. `--reason` invalid option for `cmd_transition` (todo-sync.sh)

Phase 0.6 (queue-dispatchability reconciliation) and Phase 7b (DB-TODO reconciliation) passed --reason to cmd_transition, but the only valid option is --error. The unknown option handler returned 1, causing the || { continue } error path. Fixed 4 occurrences.

3. `queued:blocked` added to VALID_TRANSITIONS (supervisor-helper.sh)

Defense-in-depth: even though the dedup guard now uses cancelled, queued -> blocked is semantically valid (e.g., a queued task discovering unresolved dependencies). Added to prevent future code from hitting the same invalid transition error.

Impact

Supervisor pulse now completes all phases (0-14) instead of dying at Phase 2
Health issue ([Supervisor:marcusquinn] 0 queued, 0 working, 1 in review at 14:46 UTC #1502) updates on every pulse again
PR lifecycle (Phase 3) processes complete/pr_review tasks
Queued tasks dispatch without getting stuck in dedup guard loops

Verification

ShellCheck: zero warnings on all 3 changed files
rg 'cmd_transition.*--reason' returns zero matches
queued:blocked now in VALID_TRANSITIONS array

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed task failure handling: tasks are now properly cancelled after reaching maximum consecutive failures instead of being blocked
- Fixed task queue state transition logic to prevent invalid state combinations and improve system stability
- Refined error reporting mechanism to accurately track and communicate failure context across task transitions

- Dedup guard: use cancelled instead of blocked for queued tasks (queued->blocked was invalid) - Phase 0.6/7b: change --reason to --error (--reason is not a valid cmd_transition option) - Add queued:blocked to VALID_TRANSITIONS as defense-in-depth for future code paths

gemini-code-assist · 2026-02-23T01:34:55Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical bugs within the supervisor pulse mechanism that were causing it to prematurely terminate. By rectifying invalid state transitions, correcting command-line options, and enhancing state validation, the changes ensure the supervisor pulse now reliably completes all its operational phases, leading to consistent health issue updates, proper PR lifecycle processing, and uninterrupted task dispatching.

Highlights

Dedup guard state transition fix: Corrected an invalid state transition from 'queued' to 'blocked' in the dispatch dedup guard by changing it to 'queued' to 'cancelled', preventing silent failures and ensuring proper task handling.
cmd_transition option correction: Replaced the invalid '--reason' option with the correct '--error' option in 'cmd_transition' calls within reconciliation phases, resolving errors caused by unknown option handling.
Added 'queued:blocked' to valid transitions: Included 'queued:blocked' in the 'VALID_TRANSITIONS' array as a defense-in-depth measure, acknowledging its semantic validity for future use cases despite the immediate fix using 'cancelled'.

Changelog

.agents/scripts/supervisor-helper.sh
- Added "queued:blocked" to the VALID_TRANSITIONS array.
.agents/scripts/supervisor/dispatch.sh
- Updated the dedup guard logic to transition tasks from 'queued' to 'cancelled' instead of 'blocked' when consecutive failures occur.
- Modified log messages, task notifications, and failure pattern storage to reflect the 'cancelled' state.
.agents/scripts/supervisor/todo-sync.sh
- Replaced the '--reason' option with '--error' in four instances of 'cmd_transition' calls within 'cmd_reconcile_db_todo' and 'cmd_reconcile_queue_dispatchability' functions.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-02-23T01:34:56Z

Walkthrough

Three supervisor scripts updated to refine state machine transitions and error reporting. Added queued:blocked transition, changed dispatch to cancel instead of block on max consecutive failures, and replaced --reason with --error flags for error reporting across multiple transition calls.

Changes

Cohort / File(s)	Summary
State Transitions `.agents/scripts/supervisor-helper.sh`	Added `"queued:blocked"` to `VALID_TRANSITIONS` array, expanding permitted state transitions.
Dispatch Deduplication Logic `.agents/scripts/supervisor/dispatch.sh`	Modified max-failure handling to transition tasks to `CANCELLED` instead of `BLOCKED`, with comment clarifying that `queued->blocked` is invalid and manual re-queuing remains possible.
Error Reporting Flags `.agents/scripts/supervisor/todo-sync.sh`	Replaced `--reason` CLI flag with `--error` flag across Phase 7b, Phase 0.6 queue reconciliation, and phantom/dispatch transition checks for consistent error categorization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

t1206: Add dispatch deduplication guard for repeated task failures #1835: Modifies dispatch deduplication and task-state transition logic; this PR alters that flow to use cancellation instead of blocking.
feat: add supervisor-helper.sh with SQLite schema and state machine (t128.1) #376: Directly related through changes to VALID_TRANSITIONS in supervisor-helper.sh, including the queued:blocked transition.
feat: add 3-tier outcome evaluation and re-prompt cycle to supervisor (t128.3) #378: Addresses supervisor state transitions with emphasis on blocked vs cancelled outcomes and retry/timeout handling.

Poem

🔄 States flow with grace, from queued to blocked at last,
But cancelled now when failures strike too fast,
Error flags rename, --reason becomes --error true,
The supervisor dances—old transitions made anew! 🎭

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: resolve three supervisor pulse-blocking bugs' directly and accurately summarizes the main change—fixing three critical bugs in supervisor scripts that were causing pulse failures.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bugfix/supervisor-pulse-blockers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-23T01:35:12Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 40 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Mon Feb 23 01:35:09 UTC 2026: Code review monitoring started
Mon Feb 23 01:35:09 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 40

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 40
VULNERABILITIES: 0

Generated on: Mon Feb 23 01:35:12 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-23T01:36:05Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/supervisor/dispatch.sh:
- Around line 1531-1540: The TODO sync call uses update_todo_on_blocked after
transitioning the task to "cancelled", causing TODO.md to show BLOCKED instead
of CANCELLED; replace the call to update_todo_on_blocked with
update_todo_on_cancelled (referencing cmd_transition, update_todo_on_cancelled,
send_task_notification, store_failure_pattern, and the local block_reason
variable) so the TODO state matches the DB, and update or remove the stale
comment about "queued->blocked is not a valid transition" to reflect the current
VALID_TRANSITIONS or explain why cancellation is still chosen here.

coderabbitai · 2026-02-23T01:38:37Z

.agents/scripts/supervisor/dispatch.sh

+	# Rule 2: Cancel after max_consecutive identical failures
+	# Note: queued->blocked is not a valid transition; use cancelled instead.
+	# The task can be manually re-queued after investigation.
 	if [[ "$consecutive_count" -ge "$max_consecutive" ]]; then
 		local block_reason="Dispatch dedup guard: $consecutive_count consecutive identical failures (error: ${last_error:-unknown}) — manual intervention required (t1206)"
-		log_warn "  $task_id: BLOCKED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
-		cmd_transition "$task_id" "blocked" --error "$block_reason" 2>/dev/null || true
+		log_warn "  $task_id: CANCELLED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
+		cmd_transition "$task_id" "cancelled" --error "$block_reason" 2>/dev/null || true
 		update_todo_on_blocked "$task_id" "$block_reason" 2>/dev/null || true
-		send_task_notification "$task_id" "blocked" "$block_reason" 2>/dev/null || true
-		store_failure_pattern "$task_id" "blocked" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true
+		send_task_notification "$task_id" "cancelled" "$block_reason" 2>/dev/null || true
+		store_failure_pattern "$task_id" "cancelled" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true


⚠️ Potential issue | 🟠 Major

Use cancellation-specific TODO updates to avoid state drift.

The task now transitions to cancelled, but the TODO sync still uses update_todo_on_blocked, which will annotate BLOCKED in TODO.md and diverge from the DB state. Please switch to update_todo_on_cancelled. Also, the note about queued->blocked being invalid is now stale with the new VALID_TRANSITIONS entry—either update it or clarify why cancellation is still preferred here.

🛠️ Suggested fix

- update_todo_on_blocked "$task_id" "$block_reason" 2>/dev/null || true + update_todo_on_cancelled "$task_id" "$block_reason" 2>/dev/null || true

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Rule 2: Cancel after max_consecutive identical failures

# Note: queued->blocked is not a valid transition; use cancelled instead.

# The task can be manually re-queued after investigation.

if [[ "$consecutive_count" -ge "$max_consecutive" ]]; then

local block_reason="Dispatch dedup guard: $consecutive_count consecutive identical failures (error: ${last_error:-unknown}) — manual intervention required (t1206)"

log_warn " $task_id: BLOCKED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"

cmd_transition "$task_id" "blocked" --error "$block_reason" 2>/dev/null || true

log_warn " $task_id: CANCELLED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"

cmd_transition "$task_id" "cancelled" --error "$block_reason" 2>/dev/null || true

update_todo_on_blocked "$task_id" "$block_reason" 2>/dev/null || true

send_task_notification "$task_id" "blocked" "$block_reason" 2>/dev/null || true

store_failure_pattern "$task_id" "blocked" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true

send_task_notification "$task_id" "cancelled" "$block_reason" 2>/dev/null || true

store_failure_pattern "$task_id" "cancelled" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true

# Rule 2: Cancel after max_consecutive identical failures

# Note: queued->blocked is not a valid transition; use cancelled instead.

# The task can be manually re-queued after investigation.

if [[ "$consecutive_count" -ge "$max_consecutive" ]]; then

local block_reason="Dispatch dedup guard: $consecutive_count consecutive identical failures (error: ${last_error:-unknown}) — manual intervention required (t1206)"

log_warn " $task_id: CANCELLED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"

cmd_transition "$task_id" "cancelled" --error "$block_reason" 2>/dev/null || true

update_todo_on_cancelled "$task_id" "$block_reason" 2>/dev/null || true

send_task_notification "$task_id" "cancelled" "$block_reason" 2>/dev/null || true

store_failure_pattern "$task_id" "cancelled" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/scripts/supervisor/dispatch.sh around lines 1531 - 1540, The TODO sync call uses update_todo_on_blocked after transitioning the task to "cancelled", causing TODO.md to show BLOCKED instead of CANCELLED; replace the call to update_todo_on_blocked with update_todo_on_cancelled (referencing cmd_transition, update_todo_on_cancelled, send_task_notification, store_failure_pattern, and the local block_reason variable) so the TODO state matches the DB, and update or remove the stale comment about "queued->blocked is not a valid transition" to reflect the current VALID_TRANSITIONS or explain why cancellation is still chosen here.

gemini-code-assist

Code Review

The pull request addresses critical pulse-blocking bugs by correcting invalid state transitions and fixing incorrect CLI options. However, there are some inconsistencies in the implementation of the dedup guard in dispatch.sh and several violations of the repository's shell scripting guidelines regarding variable declarations and error suppression. Specifically, the PR makes queued:blocked a valid transition in the state machine but then avoids using it in the dedup guard logic, opting for cancelled instead. Additionally, blanket error suppression with 2>/dev/null is used in several places, which violates Rule 1 of the repository guidelines, and variable declarations do not follow Rule 9.

gemini-code-assist · 2026-02-23T01:40:03Z

.agents/scripts/supervisor/dispatch.sh

+	# Rule 2: Cancel after max_consecutive identical failures
+	# Note: queued->blocked is not a valid transition; use cancelled instead.
+	# The task can be manually re-queued after investigation.
 	if [[ "$consecutive_count" -ge "$max_consecutive" ]]; then
 		local block_reason="Dispatch dedup guard: $consecutive_count consecutive identical failures (error: ${last_error:-unknown}) — manual intervention required (t1206)"
-		log_warn "  $task_id: BLOCKED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
-		cmd_transition "$task_id" "blocked" --error "$block_reason" 2>/dev/null || true
+		log_warn "  $task_id: CANCELLED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"
+		cmd_transition "$task_id" "cancelled" --error "$block_reason" 2>/dev/null || true
 		update_todo_on_blocked "$task_id" "$block_reason" 2>/dev/null || true
-		send_task_notification "$task_id" "blocked" "$block_reason" 2>/dev/null || true
-		store_failure_pattern "$task_id" "blocked" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true
+		send_task_notification "$task_id" "cancelled" "$block_reason" 2>/dev/null || true
+		store_failure_pattern "$task_id" "cancelled" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true
 		return 1


There is a logic inconsistency here. The PR adds queued:blocked to VALID_TRANSITIONS in supervisor-helper.sh (line 248), but this code switches to cancelled because it claims queued:blocked is invalid. Since the transition is now valid, blocked should be used as it correctly reflects the 'manual intervention required' status mentioned in the logs. Furthermore, the use of 2>/dev/null || true for blanket error suppression violates the repository guidelines (Rule 1), which states that 2>/dev/null should be avoided for blanket suppression to ensure errors remain visible for debugging. Errors should be redirected to ${SUPERVISOR_LOG} to maintain visibility for debugging.

Suggested change

# Rule 2: Cancel after max_consecutive identical failures

# Note: queued->blocked is not a valid transition; use cancelled instead.

# The task can be manually re-queued after investigation.

if [[ "$consecutive_count" -ge "$max_consecutive" ]]; then

local block_reason="Dispatch dedup guard: $consecutive_count consecutive identical failures (error: ${last_error:-unknown}) — manual intervention required (t1206)"

log_warn " $task_id: BLOCKED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"

cmd_transition "$task_id" "blocked" --error "$block_reason" 2>/dev/null || true

log_warn " $task_id: CANCELLED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"

cmd_transition "$task_id" "cancelled" --error "$block_reason" 2>/dev/null || true

update_todo_on_blocked "$task_id" "$block_reason" 2>/dev/null || true

send_task_notification "$task_id" "blocked" "$block_reason" 2>/dev/null || true

store_failure_pattern "$task_id" "blocked" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true

send_task_notification "$task_id" "cancelled" "$block_reason" 2>/dev/null || true

store_failure_pattern "$task_id" "cancelled" "$block_reason" "dispatch-dedup-guard" 2>/dev/null || true

return 1

# Rule 2: Block after max_consecutive identical failures

# Note: queued->blocked is now a valid transition in VALID_TRANSITIONS.

# The task can be manually re-queued after investigation.

if [[ "$consecutive_count" -ge "$max_consecutive" ]]; then

local block_reason

block_reason="Dispatch dedup guard: $consecutive_count consecutive identical failures (error: ${last_error:-unknown}) — manual intervention required (t1206)"

log_warn " $task_id: BLOCKED by dedup guard — $consecutive_count consecutive identical failures with error '${last_error:-unknown}'"

cmd_transition "$task_id" "blocked" --error "$block_reason" 2>>"${SUPERVISOR_LOG:-/dev/null}" || true

update_todo_on_blocked "$task_id" "$block_reason" 2>>"${SUPERVISOR_LOG:-/dev/null}" || true

send_task_notification "$task_id" "blocked" "$block_reason" 2>>"${SUPERVISOR_LOG:-/dev/null}" || true

store_failure_pattern "$task_id" "blocked" "$block_reason" "dispatch-dedup-guard" 2>>"${SUPERVISOR_LOG:-/dev/null}" || true

return 1

fi

References

Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

marcusquinn merged commit d81009d into main Feb 23, 2026
10 of 11 checks passed

marcusquinn deleted the bugfix/supervisor-pulse-blockers branch February 23, 2026 01:37

coderabbitai bot requested changes Feb 23, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve three supervisor pulse-blocking bugs#2185

fix: resolve three supervisor pulse-blocking bugs#2185
marcusquinn merged 1 commit intomainfrom
bugfix/supervisor-pulse-blockers

marcusquinn commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Feb 23, 2026

Uh oh!

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 23, 2026

Uh oh!

sonarqubecloud bot commented Feb 23, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marcusquinn commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bugs Fixed

1. Dedup guard invalid state transition (dispatch.sh)

2. --reason invalid option for cmd_transition (todo-sync.sh)

3. queued:blocked added to VALID_TRANSITIONS (supervisor-helper.sh)

Impact

Verification

Summary by CodeRabbit

Release Notes

Uh oh!

gemini-code-assist bot commented Feb 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

github-actions bot commented Feb 23, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 23, 2026

Quality Gate passed

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

2. `--reason` invalid option for `cmd_transition` (todo-sync.sh)

3. `queued:blocked` added to VALID_TRANSITIONS (supervisor-helper.sh)

coderabbitai bot commented Feb 23, 2026 •

edited

Loading