Skip to content

t1331: Supervisor circuit breaker — pause on consecutive failures#2271

Merged
marcusquinn merged 3 commits intomainfrom
feature/t1331
Feb 25, 2026
Merged

t1331: Supervisor circuit breaker — pause on consecutive failures#2271
marcusquinn merged 3 commits intomainfrom
feature/t1331

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 25, 2026

WIP - incremental commits

Ref #2264

Summary by CodeRabbit

  • New Features
    • Implemented circuit-breaker system that halts task dispatch when consecutive failures exceed a threshold. Auto-recovers after a configurable cooldown period. Creates and updates GitHub issues to track circuit-breaker state and events.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5a5bc3 and a150e85.

📒 Files selected for processing (4)
  • .agents/scripts/supervisor-helper.sh
  • .agents/scripts/supervisor/circuit-breaker.sh
  • .agents/scripts/supervisor/pulse.sh
  • todo/tasks/t1331-brief.md

Walkthrough

Implements a comprehensive circuit-breaker pattern for supervisor task dispatch. The system tracks consecutive task failures, trips dispatch prevention at a configurable threshold, persists state atomically to JSON, integrates with GitHub issue creation/updates, and supports both manual and automatic reset with cooldown periods.

Changes

Cohort / File(s) Summary
Circuit-Breaker Core Implementation
.agents/scripts/supervisor/circuit-breaker.sh
New 511-line module implementing full circuit-breaker state machine with persistent JSON state, atomic read/write operations, GitHub issue integration, and CLI subcommands (status, reset, check, trip). Includes cb_read_state, cb_write_state, cb_record_failure, cb_record_success, cb_check, cb_reset, cb_status, and GitHub issue management functions.
Supervisor CLI Integration
.agents/scripts/supervisor-helper.sh
Wires circuit-breaker into supervisor CLI by sourcing the new circuit-breaker module and exposing circuit-breaker as a top-level command with dedicated cmd_circuit_breaker handler and updated help text.
Pulse Workflow Integration
.agents/scripts/supervisor/pulse.sh
Integrates circuit-breaker hooks into task execution flow: records failures on retry/blocked paths via cb_record_failure, records successes via cb_record_success, and gates Phase 2 dispatch with a cb_check guard to skip dispatch when breaker is open.
Acceptance Criteria Verification
todo/tasks/t1331-brief.md
Updates acceptance criteria with verification blocks to scan supervisor scripts directory for circuit-breaker pattern implementation and consecutive_fail tracking evidence.

Sequence Diagrams

sequenceDiagram
    participant Pulse as Pulse Coordinator
    participant Circuit as Circuit Breaker
    participant State as State File
    participant GitHub as GitHub API
    participant Dispatch as Phase 2 Dispatch

    rect rgba(100, 150, 200, 0.5)
    Note over Pulse,Dispatch: Task Failure Path
    Pulse->>Circuit: cb_record_failure()
    Circuit->>State: Read current state
    State-->>Circuit: state JSON
    Circuit->>Circuit: Increment consecutive_failures
    Circuit->>Circuit: Check if threshold met
    Circuit->>State: Write updated state (tripped=true)
    Circuit->>GitHub: _cb_create_or_update_issue()
    GitHub-->>Circuit: Issue created/updated
    Circuit-->>Pulse: Breaker tripped
    end

    rect rgba(150, 200, 100, 0.5)
    Note over Pulse,Dispatch: Dispatch Gate Check
    Pulse->>Circuit: cb_check()
    Circuit->>State: Read state
    State-->>Circuit: state JSON
    Circuit->>Circuit: If tripped & cooldown elapsed: auto-reset
    Circuit-->>Pulse: Dispatch allowed or blocked
    alt Breaker Open
        Pulse->>Dispatch: Skip dispatch (log message)
    else Breaker Closed
        Pulse->>Dispatch: Proceed with batch/non-batch logic
    end
    end

    rect rgba(200, 150, 100, 0.5)
    Note over Pulse,GitHub: Task Success Path
    Pulse->>Circuit: cb_record_success()
    Circuit->>State: Reset consecutive_failures to 0
    Circuit->>State: Clear tripped flag
    Circuit->>GitHub: _cb_close_issue()
    GitHub-->>Circuit: Issue closed
    Circuit-->>Pulse: Breaker reset
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

Poem

🔄 When tasks stumble three times in a row,
The circuit breaks—dispatch must slow,
A GitHub issue marks the pain,
Till cooldown chills and success flows again,
A guardian against cascading woe. 🛡️

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t1331

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust circuit breaker mechanism for the supervisor, designed to enhance system resilience and operational efficiency. By automatically detecting and responding to repeated task failures, it prevents resource exhaustion and provides clear, automated notifications through GitHub issues, allowing for timely intervention and resolution of underlying problems.

Highlights

  • Supervisor Circuit Breaker Implementation: Introduced a new Bash script, circuit-breaker.sh, to implement a supervisor-level circuit breaker pattern. This system tracks consecutive task failures to prevent infinite retry loops.
  • Automated Dispatch Pausing: The circuit breaker automatically pauses task dispatch when a configurable threshold of consecutive failures (default: 3) is reached, stopping the supervisor from wasting resources on persistently failing tasks.
  • GitHub Issue Integration: Upon tripping, the circuit breaker creates or updates a GitHub issue with a circuit-breaker label, providing immediate visibility and actionable information to maintainers. It also closes the issue upon reset.
  • Reset Mechanisms: The circuit breaker can be reset manually via a supervisor-helper.sh command or automatically after a configurable cooldown period (default: 30 minutes). Any successful task completion also resets the failure counter.
  • State Management: The circuit breaker's state (consecutive failures, tripped status, timestamps) is persisted in a JSON file, ensuring continuity across supervisor runs.
Changelog
  • .agents/scripts/supervisor/circuit-breaker.sh
    • Added a new Bash script to implement the supervisor circuit breaker logic, including failure tracking, state persistence, dispatch control, and GitHub issue management.
  • todo/tasks/t1331-brief.md
    • Added a new task brief documenting the requirements, approach, and acceptance criteria for the supervisor circuit breaker feature.
Activity
  • The pull request is marked as 'WIP - incremental commits', indicating ongoing development.
  • It references issue t1331: Supervisor circuit breaker #2264, suggesting it addresses a known problem or feature request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 58 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Feb 25 03:02:49 UTC 2026: Code review monitoring started
Wed Feb 25 03:02:49 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 58

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 58
  • VULNERABILITIES: 0

Generated on: Wed Feb 25 03:02:52 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust circuit breaker mechanism for the supervisor, which is a valuable addition for preventing resource wastage on consecutive task failures. The implementation correctly handles failure tracking, auto-reset with a configurable cooldown, manual reset, and integrates with GitHub issues for notifications. The overall logic is sound and addresses the problem effectively. However, there are several instances where command error output is suppressed using 2>/dev/null. This practice can hide critical debugging information related to jq parsing errors, date command failures, or gh CLI issues (e.g., authentication, network problems). The comments provided suggest modifications to remove 2>/dev/null or replace it with || true or || echo "0" to ensure that actual syntax or system errors remain visible for debugging, aligning with repository guidelines.


_cb_state_file() {
local dir="${SUPERVISOR_DIR:-${HOME}/.aidevops/.agent-workspace/supervisor}"
mkdir -p "$dir" 2>/dev/null || true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing all errors from mkdir -p with 2>/dev/null can hide important issues, such as permission problems if the directory already exists but is not writable. It's better to allow stderr to be visible for debugging.

Suggested change
mkdir -p "$dir" 2>/dev/null || true
mkdir -p "$dir" || true
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

state=$(cb_read_state)

local current_count
current_count=$(printf '%s' "$state" | jq -r '.consecutive_failures // 0' 2>/dev/null || echo "0")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing jq errors with 2>/dev/null can hide syntax errors in the jq filter or issues with the jq executable itself. The // operator handles missing keys, so 2>/dev/null is not needed for that purpose. It's better to let jq errors be visible for debugging.

Suggested change
current_count=$(printf '%s' "$state" | jq -r '.consecutive_failures // 0' 2>/dev/null || echo "0")
current_count=$(printf '%s' "$state" | jq -r '.consecutive_failures // 0' || echo "0")
References
  1. In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.

now=$(date -u +%Y-%m-%dT%H:%M:%SZ)

local tripped
tripped=$(printf '%s' "$state" | jq -r '.tripped // false' 2>/dev/null || echo "false")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing jq errors with 2>/dev/null can hide syntax errors in the jq filter or issues with the jq executable itself. The // operator handles missing keys, so 2>/dev/null is not needed for that purpose. It's better to let jq errors be visible for debugging.

Suggested change
tripped=$(printf '%s' "$state" | jq -r '.tripped // false' 2>/dev/null || echo "false")
tripped=$(printf '%s' "$state" | jq -r '.tripped // false' || echo "false")
References
  1. In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.

--arg task "$task_id" \
--arg reason "$failure_reason" \
'.consecutive_failures = $count | .last_failure_at = $now | .last_failure_task = $task | .last_failure_reason = $reason' \
2>/dev/null)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing jq errors with 2>/dev/null can hide syntax errors in the jq filter or issues with the jq executable itself. It's better to let jq errors be visible for debugging.

Suggested change
2>/dev/null)
new_state=$(printf '%s' "$state" | jq \
--argjson count "$new_count" \
--arg now "$now" \
--arg task "$task_id" \
--arg reason "$failure_reason" \
'.consecutive_failures = $count | .last_failure_at = $now | .last_failure_task = $task | .last_failure_reason = $reason')
References
  1. In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.

new_state=$(printf '%s' "$new_state" | jq \
--arg now "$now" \
'.tripped = true | .tripped_at = $now' \
2>/dev/null)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing jq errors with 2>/dev/null can hide syntax errors in the jq filter or issues with the jq executable itself. It's better to let jq errors be visible for debugging.

Suggested change
2>/dev/null)
new_state=$(printf '%s' "$new_state" | jq \
--arg now "$now" \
'.tripped = true | .tripped_at = $now')
References
  1. In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.

Comment on lines 405 to 409
2>/dev/null || true

local issue_url
issue_url=$(gh issue create \
--title "Supervisor circuit breaker tripped — ${failure_count} consecutive failures" \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing gh CLI errors with 2>/dev/null can hide issues with GitHub API access, authentication, or network problems. While || true makes it non-blocking, it's better to see the errors for debugging.

Suggested change
2>/dev/null || true
local issue_url
issue_url=$(gh issue create \
--title "Supervisor circuit breaker tripped — ${failure_count} consecutive failures" \
gh label create "circuit-breaker" \
--description "Supervisor circuit breaker tripped — dispatch paused" \
--color "D93F0B" \
--force || true
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

Comment on lines 412 to 416
2>/dev/null) || {
log_warn "circuit-breaker: failed to create GitHub issue"
return 1
}
log_info "circuit-breaker: created GitHub issue: $issue_url"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing gh CLI errors with 2>/dev/null can hide issues with GitHub API access, authentication, or network problems. While the || { ... } block catches the failure, 2>/dev/null prevents seeing the actual error message from gh.

Suggested change
2>/dev/null) || {
log_warn "circuit-breaker: failed to create GitHub issue"
return 1
}
log_info "circuit-breaker: created GitHub issue: $issue_url"
issue_url=$(gh issue create \
--title "Supervisor circuit breaker tripped — ${failure_count} consecutive failures" \
--body "$body" \
--label "circuit-breaker") || {
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

Comment on lines 437 to 441
--state open \
--json number \
--jq '.[0].number // empty' \
2>/dev/null) || existing_issue=""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing gh CLI errors with 2>/dev/null can hide issues with GitHub API access, authentication, or network problems. It's better to let these errors be visible for debugging.

Suggested change
--state open \
--json number \
--jq '.[0].number // empty' \
2>/dev/null) || existing_issue=""
existing_issue=$(gh issue list \
--label "circuit-breaker" \
--state open \
--json number \
--jq '.[0].number // empty') || existing_issue=""
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

fi

gh issue close "$existing_issue" \
--comment "Circuit breaker reset: ${reason}" \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing gh CLI errors with 2>/dev/null can hide issues with GitHub API access, authentication, or network problems. While the || { ... } block catches the failure, 2>/dev/null prevents seeing the actual error message from gh.

Suggested change
--comment "Circuit breaker reset: ${reason}" \
gh issue close "$existing_issue" \
--comment "Circuit breaker reset: ${reason}" || {
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.

Comment on lines 493 to 498
state=$(jq -n \
--argjson count "$CIRCUIT_BREAKER_THRESHOLD" \
--arg now "$now" \
--arg task "$task_id" \
--arg reason "$reason" \
'{consecutive_failures: $count, tripped: true, tripped_at: $now, last_failure_at: $now, last_failure_task: $task, last_failure_reason: $reason}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing jq errors with 2>/dev/null can hide syntax errors in the jq filter or issues with the jq executable itself. It's better to let jq errors be visible for debugging.

Suggested change
state=$(jq -n \
--argjson count "$CIRCUIT_BREAKER_THRESHOLD" \
--arg now "$now" \
--arg task "$task_id" \
--arg reason "$reason" \
'{consecutive_failures: $count, tripped: true, tripped_at: $now, last_failure_at: $now, last_failure_task: $task, last_failure_reason: $reason}')
state=$(jq -n \
--argjson count "$CIRCUIT_BREAKER_THRESHOLD" \
--arg now "$now" \
--arg task "$task_id" \
--arg reason "$reason" \
'{consecutive_failures: $count, tripped: true, tripped_at: $now, last_failure_at: $now, last_failure_task: $task, last_failure_reason: $reason}')
References
  1. In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.

New file: .agents/scripts/supervisor/circuit-breaker.sh
- Tracks consecutive task failures globally via state file
- Trips after N failures (default: 3, configurable via SUPERVISOR_CIRCUIT_BREAKER_THRESHOLD)
- Pauses dispatch when tripped, creates/updates GitHub issue with circuit-breaker label
- Manual reset via supervisor-helper.sh circuit-breaker reset
- Auto-reset after configurable cooldown (SUPERVISOR_CIRCUIT_BREAKER_COOLDOWN_SECS)
- Counter resets on any task success
- Remove unused 'cooldown_secs' variable declaration in cb_status()
- Remove unused 'repo_path' variable in _cb_create_or_update_issue()
- Add blank lines before fenced code blocks in t1331-brief.md (MD031)
… into pulse

Remove blanket 2>/dev/null suppression from jq, gh, date, and mkdir
commands in circuit-breaker.sh per Codacy review. Errors now flow to
stderr for debugging while || fallbacks prevent script abort.

Integrate cb_record_success/cb_record_failure/cb_check into pulse.sh:
- Phase 1: record success on task completion, failure on retry/blocked
- Phase 2: gate dispatch on cb_check (skip if circuit breaker tripped)

Wire circuit-breaker subcommand in supervisor-helper.sh main router.

Ref #2264
@augmentcode
Copy link

augmentcode bot commented Feb 25, 2026

🤖 Augment PR Summary

Summary: This PR introduces a supervisor-wide circuit breaker to pause dispatch after consecutive task failures.

Changes:

  • Added .agents/scripts/supervisor/circuit-breaker.sh to persist breaker state and provide record_failure, record_success, check, status, and reset operations.
  • Wired the module into supervisor-helper.sh (new circuit-breaker subcommand and module sourcing).
  • Integrated breaker tracking into pulse.sh: reset counter on task completion, record failures on retry/blocked outcomes, and skip Phase 2 dispatch when the breaker is open.
  • Implemented optional GitHub issue creation/update (label circuit-breaker) on trip and issue close on reset via gh.

Technical Notes: Threshold and cooldown are configurable via env vars; dispatch can auto-resume after cooldown or via manual reset.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 5 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

state_file=$(_cb_state_file)

if [[ -f "$state_file" ]]; then
cat "$state_file"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because supervisor-helper.sh runs with set -euo pipefail, a read failure here (e.g., state file exists but isn’t readable) can abort the entire supervisor run even though the circuit breaker is intended to be non-blocking. Consider making state reads resilient (treat read errors as “no state” and continue).

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.


# Atomic write via temp file + mv
local tmp_file="${state_file}.tmp.$$"
printf '%s\n' "$state_json" >"$tmp_file"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With set -euo pipefail enabled in the caller, a write failure (permissions/disk full) on this redirection can exit the supervisor before the circuit-breaker functions can return 0. Consider explicitly handling write/move failures so the breaker can fail-open without killing dispatch.

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.


# Update state with new failure count
local new_state
new_state=$(printf '%s' "$state" | jq \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This jq call is unguarded, so if it fails (missing jq, corrupted JSON, etc.) it can terminate the supervisor under set -euo pipefail before the subsequent -z "$new_state" check runs. Consider ensuring JSON update failures are non-fatal in all cb_* paths.

Severity: high

Other Locations
  • .agents/scripts/supervisor/circuit-breaker.sh:161
  • .agents/scripts/supervisor/circuit-breaker.sh:246

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

local tripped count tripped_at last_failure last_reset
tripped=$(printf '%s' "$state" | jq -r '.tripped // false' || echo "false")
count=$(printf '%s' "$state" | jq -r '.consecutive_failures // 0' || echo "0")
tripped_at=$(printf '%s' "$state" | jq -r '.tripped_at // "never"' || echo "never")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state defaults use empty-string timestamps, but jq’s // won’t replace empty strings, so tripped_at/last_reset_at may print blank (and the cooldown math may try to parse an empty date). Consider normalizing empty strings to a sentinel like never before display/math.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

retry)
log_warn " $tid: RETRY ($outcome_detail)"
# t1331: Record failure for circuit breaker (consecutive failure tracking)
cb_record_failure "$tid" "$outcome_detail" 2>>"$SUPERVISOR_LOG" || true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the breaker increments on retry/blocked, but the failed) outcome path later in this same case doesn’t record a failure, so repeated hard failures may never trip the breaker. Consider whether failed (or at least non-environment failed) should also feed cb_record_failure to match “consecutive task failures” semantics.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 71 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Feb 25 14:27:18 UTC 2026: Code review monitoring started
Wed Feb 25 14:27:18 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 71

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 71
  • VULNERABILITIES: 0

Generated on: Wed Feb 25 14:27:21 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

@marcusquinn marcusquinn merged commit 9bd2029 into main Feb 25, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant