[safe-output-health] Safe Output Health Report - 2026-04-07 #25096
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Safe Output Health Monitor. A newer discussion is available at Discussion #25308. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
Today's 24-hour audit covered 45 agentic workflow runs across 40 distinct workflows. Overall safe output health is good — 25 safe output job executions with only 1 failure (96% success rate). The single failure was a transient GitHub API rate limit during a concurrent burst window, consistent with a pattern first observed on 2026-04-02. Three additional runs had safe outputs skipped due to upstream agent failures (bad credentials and invalid Gemini API key), both recurring known issues.
Safe Output Job Statistics
create_discussioncreate_issueadd_commentnoopcreate_pull_requestmissing_dataassign_to_agentcreate_pull_request_review_commentupdate_issuepush_to_pull_request_branchdispatch_workflowsubmit_pull_request_reviewset_issue_typecreate_code_scanning_alertupdate_pull_requestadd_reviewermissing_toolreport_incompletepost_slack_message,send_slack_message)Total safe output items submitted: 75 across 36 runs with non-empty agent output.
Error Clusters
Cluster 1: API Rate Limit on Concurrent Burst (1 occurrence)
create_issueCluster 2: Safe Outputs Skipped — Agent Failures (3 runs, no safe outputs lost)
These are agent-level failures, not safe output failures. Safe output jobs were never scheduled because agent jobs failed to produce output. Included for completeness.
##[error]Bad credentialschecking outgithubnext/gh-aw-side-repo##[error]Bad credentialschecking outgithubnext/gh-aw-side-repoAPI key not valid(INVALID_ARGUMENT), exit code 144The bad credentials issue for the side-repo checkout is a recurring pattern — it has appeared on multiple consecutive days. The Gemini API key invalidity is also recurring.
Root Cause Analysis
API Rate Limit Issue
The GitHub App installation rate limit is triggered when multiple concurrent safe output jobs attempt GitHub API calls within the same time window. Today's burst window was 12:06–12:08 UTC. The
create_issuehandler does not implement retry-with-backoff for rate limit responses (HTTP 429 / 403 with rate-limit body), so the first affected call fails immediately.Cross-Repo Credential Issue (Agent-Level, Not Safe Output)
The
githubnext/gh-aw-side-repocheckout step in Smoke Create/Update Cross-Repo PR workflows uses a PAT that appears expired or revoked. This prevents the agent from running entirely, so no safe output items are ever produced. Detection and safe_outputs jobs are consequently skipped (conditional:needs.detection.result == 'success'is never met).Gemini API Key (Agent-Level, Not Safe Output)
The Gemini API key configured for Smoke Gemini is invalid (
API_KEY_INVALIDfromgenerativelanguage.googleapis.com). The Gemini CLI exits with code 144 after the first API call attempt.Recurring Missing Data — Auto-Triage Issues (Non-Error)
Two Auto-Triage Issues runs (§24082713086, §24082753334) emitted
missing_datafor issue #25092. The reason: the DIFC integrity filter blocks reading the issue content (the issue's integrity level is below the workflow's required threshold). This is expected behavior — the safe output handler correctly recorded the missing data signal. Not a failure.Recommendations
Critical Issues (Immediate Action Required)
Rotate PAT for
githubnext/gh-aw-side-repogithubnext/gh-aw-side-repoin the Smoke Create/Update Cross-Repo PR workflowsRenew Gemini API Key
GEMINI_API_KEYsecret is invalid (API_KEY_INVALID)Bug Fixes Required
safe_output_handler_manager.cjsfails immediately on HTTP 429/403-rate-limit responses with no retrycreate_issue,create_discussion,add_commenthandlersConfiguration Changes
Work Item Plans
Work Item 1: Add Retry-With-Backoff to Safe Output API Calls
create_issue,create_discussion,add_commenthandlers retry on HTTP 429 and rate-limit HTTP 403 responses@octokit/restAPI calls in a retry helper that checkserror.status === 429 || (error.status === 403 && /rate limit/i.test(error.message))and sleeps before retryingWork Item 2: Investigate and Rotate Cross-Repo Smoke Test Credentials
Checkout githubnext/gh-aw-side-repostep with "Bad credentials". This blocks smoke test coverage for cross-repository PR operations.reposcope; update theGITHUBNEXT_TOKEN(or equivalent) secretgithubnextGitHub account or the secret-ownerHistorical Context
7-Day Trend
Trends:
push_to_pull_request_branchdisallowed files issue resolved after 2026-04-02 (Smoke Claude now uses the allowed filename)Metrics and KPIs
create_discussion,add_comment,noop,push_to_pull_request_branch— all 100%create_issue— 1 failure due to rate limit (90% today)Next Steps
githubnext/gh-aw-side-repoPAT in repository secretsupdate-discussionwarning #25092 — DIFC integrity filter is expected to clear once the issue passes integrity checksReferences:
Beta Was this translation helpful? Give feedback.
All reactions