t1248: Fix success rate metric — exclude cancelled tasks from failure count#1983
t1248: Fix success rate metric — exclude cancelled tasks from failure count#1983marcusquinn merged 1 commit intomainfrom
Conversation
Cancelled tasks are administrative cleanup actions (orphaned tasks, superseded work, cross-repo misregistration cleanup) — not worker failures. Including them in the failure count inflated the 7-day failure rate from <1% to 11%, causing false alarms and masking the true worker reliability signal. Root cause analysis (t1248): - Feb 12: 5 hung workers (t302, t303, t311.2x2, t311.3) — all hit the 1800s default timeout because their ~1h estimates weren't parsed. Fixed by t1199 (estimate-based timeout) and t1222 (graceful SIGTERM at 50% timeout). - Feb 13: 12 no_sentinel failures (t1010, t1030, t1032.1x4, t1032.2x2) at 14:40-14:56 UTC — caused by a race condition where concurrent dispatches overwrote fixed-filename wrapper scripts. Fixed by t1190 (timestamped filenames, WRAPPER_STARTED sentinel) — PR #1981 pending merge. - Feb 18-19: 67 cancellations (orphaned DB tasks, superseded tasks, cross-repo misregistration cleanup via t1237) — these are intentional, not failures. Fix: split 'failed' and 'cancelled' into separate metrics. Success rate now only counts status='failed' in the denominator, giving an accurate signal. Cancelled tasks get their own row for visibility without distorting the rate. True 7-day worker failure rate: 2/475 = 0.4% (not 11%) Pattern tracker overall rate: 94% (977/1037) — unchanged, counts retries
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the task health metrics by correcting how task failures and cancellations are reported. The primary goal is to provide a more precise success rate by excluding administratively cancelled tasks from the failure count, which previously misrepresented the system's operational health. This change ensures that the reported metrics accurately reflect genuine worker performance issues versus intentional administrative cleanups. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Feb 19 23:10:11 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Code Review
The pull request effectively addresses the issue of inflated failure rates by correctly distinguishing between 'failed' and 'cancelled' tasks. The changes accurately reflect the true success rate by excluding administrative cancellations from the failure count and introducing a separate metric for cancelled tasks. The code is clear and follows the established shell scripting patterns for database queries and variable handling.
| # Tasks failed in last 7 days (t1248: exclude cancelled — cancelled tasks are | ||
| # administrative cleanup, not worker failures; including them inflates the failure | ||
| # rate and causes false alarms. Cancelled tasks have their own metric row.) |
There was a problem hiding this comment.
The comment correctly explains the rationale, but it is verbose. Condensing it improves readability and aligns with the rule for providing a single, clear justification.
| # Tasks failed in last 7 days (t1248: exclude cancelled — cancelled tasks are | |
| # administrative cleanup, not worker failures; including them inflates the failure | |
| # rate and causes false alarms. Cancelled tasks have their own metric row.) | |
| # Tasks failed in last 7 days (t1248: exclude cancelled tasks from failure count; | |
| # cancelled tasks are administrative cleanup, not worker failures.) |
References
- Ensure comments provide a single, clear justification for a design choice, avoiding the presentation of multiple, potentially conflicting rationales.
| # Tasks cancelled in last 7 days (separate from failures — cancellations are | ||
| # intentional administrative actions: orphaned tasks, superseded work, cleanup) |
There was a problem hiding this comment.
The comment for cancelled_7d is verbose. Shortening it while retaining clarity aligns with the rule for providing a single, clear justification.
| # Tasks cancelled in last 7 days (separate from failures — cancellations are | |
| # intentional administrative actions: orphaned tasks, superseded work, cleanup) | |
| # Tasks cancelled in last 7 days (administrative actions, not worker failures) |
References
- Ensure comments provide a single, clear justification for a design choice, avoiding the presentation of multiple, potentially conflicting rationales.



Investigation Findings
Root cause analysis of the 7-day success rate drop from 94% overall to 89%.
Hung Workers (Feb 12) — ALREADY FIXED
Tasks t302, t303, t311.2, t311.3 all timed out at ~1800s (30min default).
Root cause: These tasks had
~1hor~3hestimates but the hung timeout was a fixed 1800s default that didn't read the estimate field. The workers were legitimately busy (large refactors of 14,644-line supervisor-helper.sh) but got killed as false-positive hangs.Common characteristics: All were large shell script refactoring tasks (
#refactor), dispatched at opus tier, on the aidevops repo. None were actually hung — they were doing real work.Fixes already merged:
worker_never_started:no_sentinel (Feb 13) — FIX PENDING
12 failures at 14:40-14:56 UTC for tasks t1010, t1030, t1032.1, t1032.2.
Root cause: Concurrent dispatches used fixed-filename wrapper scripts (e.g.,
t1010-wrapper.sh). A second dispatch overwrote the script before the first wrapper process read it. The first wrapper executed the new script, writing WORKER_STARTED to a different log file, leaving the original log with only metadata (no sentinel → no_sentinel failure).Model availability was healthy during the failure window (opencode cache_check: healthy, 32 models).
Fix: t1190 (PR #1981, open) — timestamped filenames prevent overwrite race, WRAPPER_STARTED sentinel added for sub-classification.
Metric Accuracy Issue — FIXED IN THIS PR
Root cause of the apparent 89% rate: The
build_health_context()function inai-context.shincludedcancelledtasks in the failure count. Cancelled tasks are administrative cleanup (orphaned DB entries, superseded tasks, cross-repo misregistration) — not worker failures.Actual numbers:
Fix: Split
failedandcancelledinto separate metric rows. Success rate denominator now only includesstatus='failed'.Cancellation Breakdown (Feb 18-19)
None of these are worker failures.
Ref #1944