t1248: Fix success rate metric — exclude cancelled tasks from failure count by marcusquinn · Pull Request #1983 · marcusquinn/aidevops

marcusquinn · 2026-02-19T23:09:39Z

Investigation Findings

Root cause analysis of the 7-day success rate drop from 94% overall to 89%.

Hung Workers (Feb 12) — ALREADY FIXED

Tasks t302, t303, t311.2, t311.3 all timed out at ~1800s (30min default).

Root cause: These tasks had ~1h or ~3h estimates but the hung timeout was a fixed 1800s default that didn't read the estimate field. The workers were legitimately busy (large refactors of 14,644-line supervisor-helper.sh) but got killed as false-positive hangs.

Common characteristics: All were large shell script refactoring tasks (#refactor), dispatched at opus tier, on the aidevops repo. None were actually hung — they were doing real work.

Fixes already merged:

t1199 (PR t1199: Tune worker hung timeout based on task ~estimate field #1826): Estimate-based timeout — 2x estimate, 4h cap, 30m default
t1222 (PR t1222: Add two-phase worker hang detection with graceful termination #1869): Graceful SIGTERM at 50% timeout before hard kill
t314 (PR feat: auto-escalate model on worker failure + extend timeouts (t314) #1215): Model escalation on failure + extended default timeouts

worker_never_started:no_sentinel (Feb 13) — FIX PENDING

12 failures at 14:40-14:56 UTC for tasks t1010, t1030, t1032.1, t1032.2.

Root cause: Concurrent dispatches used fixed-filename wrapper scripts (e.g., t1010-wrapper.sh). A second dispatch overwrote the script before the first wrapper process read it. The first wrapper executed the new script, writing WORKER_STARTED to a different log file, leaving the original log with only metadata (no sentinel → no_sentinel failure).

Model availability was healthy during the failure window (opencode cache_check: healthy, 32 models).

Fix: t1190 (PR #1981, open) — timestamped filenames prevent overwrite race, WRAPPER_STARTED sentinel added for sub-classification.

Metric Accuracy Issue — FIXED IN THIS PR

Root cause of the apparent 89% rate: The build_health_context() function in ai-context.sh included cancelled tasks in the failure count. Cancelled tasks are administrative cleanup (orphaned DB entries, superseded tasks, cross-repo misregistration) — not worker failures.

Actual numbers:

7-day: 473 completed, 2 actually failed, 55 cancelled
True failure rate: 2/475 = 0.4% (not 11%)
Pattern tracker overall: 94% (977/1037) — counts retry attempts, not final status

Fix: Split failed and cancelled into separate metric rows. Success rate denominator now only includes status='failed'.

Cancellation Breakdown (Feb 18-19)

13 tasks: orphaned DB entries not in TODO.md
8 tasks: superseded by feature/supervisor-self-healing branch
12 tasks: cross-repo misregistration cleanup (t1237)
5 tasks: pre-dispatch already-completed detection
1 task: duplicate of another task

None of these are worker failures.

Ref #1944

Cancelled tasks are administrative cleanup actions (orphaned tasks, superseded work, cross-repo misregistration cleanup) — not worker failures. Including them in the failure count inflated the 7-day failure rate from <1% to 11%, causing false alarms and masking the true worker reliability signal. Root cause analysis (t1248): - Feb 12: 5 hung workers (t302, t303, t311.2x2, t311.3) — all hit the 1800s default timeout because their ~1h estimates weren't parsed. Fixed by t1199 (estimate-based timeout) and t1222 (graceful SIGTERM at 50% timeout). - Feb 13: 12 no_sentinel failures (t1010, t1030, t1032.1x4, t1032.2x2) at 14:40-14:56 UTC — caused by a race condition where concurrent dispatches overwrote fixed-filename wrapper scripts. Fixed by t1190 (timestamped filenames, WRAPPER_STARTED sentinel) — PR #1981 pending merge. - Feb 18-19: 67 cancellations (orphaned DB tasks, superseded tasks, cross-repo misregistration cleanup via t1237) — these are intentional, not failures. Fix: split 'failed' and 'cancelled' into separate metrics. Success rate now only counts status='failed' in the denominator, giving an accurate signal. Cancelled tasks get their own row for visibility without distorting the rate. True 7-day worker failure rate: 2/475 = 0.4% (not 11%) Pattern tracker overall rate: 94% (977/1037) — unchanged, counts retries

coderabbitai · 2026-02-19T23:09:47Z

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 47 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t1248

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-19T23:09:50Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the task health metrics by correcting how task failures and cancellations are reported. The primary goal is to provide a more precise success rate by excluding administratively cancelled tasks from the failure count, which previously misrepresented the system's operational health. This change ensures that the reported metrics accurately reflect genuine worker performance issues versus intentional administrative cleanups.

Highlights

Metric Accuracy Fix: The success rate metric now accurately distinguishes between actual task failures and administrative cancellations. Previously, cancelled tasks were incorrectly included in the failure count, leading to an inflated failure rate.
Separate Cancellation Metric: A new metric has been introduced to track the number of cancelled tasks separately, providing clearer insight into administrative actions without skewing the failure rate.
Updated Success Rate Calculation: The calculation for the 7-day success rate has been adjusted to exclude cancelled tasks from the denominator, ensuring it reflects only completed and genuinely failed tasks.

Changelog

.agents/scripts/supervisor/ai-context.sh
- Modified the failed_7d query to count only tasks with a 'failed' status, excluding 'cancelled' tasks.
- Introduced a new cancelled_7d variable and corresponding database query to count tasks with a 'cancelled' status.
- Updated the total_7d calculation to correctly sum completed_7d and the newly defined failed_7d.
- Added a new output line to display the 'Cancelled (7d)' metric in the health context report.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-19T23:10:12Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 30 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Feb 19 23:10:09 UTC 2026: Code review monitoring started
Thu Feb 19 23:10:09 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 30

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 30
VULNERABILITIES: 0

Generated on: Thu Feb 19 23:10:11 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-19T23:10:59Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

gemini-code-assist

Code Review

The pull request effectively addresses the issue of inflated failure rates by correctly distinguishing between 'failed' and 'cancelled' tasks. The changes accurately reflect the true success rate by excluding administrative cancellations from the failure count and introducing a separate metric for cancelled tasks. The code is clear and follows the established shell scripting patterns for database queries and variable handling.

gemini-code-assist · 2026-02-19T23:11:39Z

.agents/scripts/supervisor/ai-context.sh

+	# Tasks failed in last 7 days (t1248: exclude cancelled — cancelled tasks are
+	# administrative cleanup, not worker failures; including them inflates the failure
+	# rate and causes false alarms. Cancelled tasks have their own metric row.)


The comment correctly explains the rationale, but it is verbose. Condensing it improves readability and aligns with the rule for providing a single, clear justification.

Suggested change

# Tasks failed in last 7 days (t1248: exclude cancelled — cancelled tasks are

# administrative cleanup, not worker failures; including them inflates the failure

# rate and causes false alarms. Cancelled tasks have their own metric row.)

# Tasks failed in last 7 days (t1248: exclude cancelled tasks from failure count;

# cancelled tasks are administrative cleanup, not worker failures.)

References

Ensure comments provide a single, clear justification for a design choice, avoiding the presentation of multiple, potentially conflicting rationales.

gemini-code-assist · 2026-02-19T23:11:39Z

.agents/scripts/supervisor/ai-context.sh

+	# Tasks cancelled in last 7 days (separate from failures — cancellations are
+	# intentional administrative actions: orphaned tasks, superseded work, cleanup)


The comment for cancelled_7d is verbose. Shortening it while retaining clarity aligns with the rule for providing a single, clear justification.

Suggested change

# Tasks cancelled in last 7 days (separate from failures — cancellations are

# intentional administrative actions: orphaned tasks, superseded work, cleanup)

# Tasks cancelled in last 7 days (administrative actions, not worker failures)

References

Ensure comments provide a single, clear justification for a design choice, avoiding the presentation of multiple, potentially conflicting rationales.

gemini-code-assist bot reviewed Feb 19, 2026

View reviewed changes

marcusquinn mentioned this pull request Feb 19, 2026

[Supervisor:marcusquinn] 0 queued, 0 working, 1 in review at 14:46 UTC #1502

Closed

marcusquinn merged commit 8b36325 into main Feb 19, 2026
19 checks passed

marcusquinn deleted the feature/t1248 branch February 19, 2026 23:33

marcusquinn added a commit that referenced this pull request Feb 19, 2026

chore: mark t1248 complete in TODO.md (#1983)

b7f89b3

marcusquinn mentioned this pull request Feb 19, 2026

t1248: Investigate 7-day success rate drop from 94% overall to 89% #1944

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t1248: Fix success rate metric — exclude cancelled tasks from failure count#1983

t1248: Fix success rate metric — exclude cancelled tasks from failure count#1983
marcusquinn merged 1 commit intomainfrom
feature/t1248

marcusquinn commented Feb 19, 2026

Uh oh!

coderabbitai bot commented Feb 19, 2026

Rate limit exceeded

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

sonarqubecloud bot commented Feb 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 19, 2026

Uh oh!

gemini-code-assist bot Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Tasks cancelled in last 7 days (separate from failures — cancellations are
		# intentional administrative actions: orphaned tasks, superseded work, cleanup)

	# Tasks cancelled in last 7 days (separate from failures — cancellations are
	# intentional administrative actions: orphaned tasks, superseded work, cleanup)
	# Tasks cancelled in last 7 days (administrative actions, not worker failures)

Conversation

marcusquinn commented Feb 19, 2026

Investigation Findings

Hung Workers (Feb 12) — ALREADY FIXED

worker_never_started:no_sentinel (Feb 13) — FIX PENDING

Metric Accuracy Issue — FIXED IN THIS PR

Cancellation Breakdown (Feb 18-19)

Uh oh!

coderabbitai bot commented Feb 19, 2026

Rate limit exceeded

Uh oh!

gemini-code-assist bot commented Feb 19, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 19, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 19, 2026

Quality Gate passed

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant