Agent Performance Report — Week of April 7, 2026 #25024
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-04-08T05:00:17.544Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
The slight score decline is driven by a new false-negative failure pattern in the GitHub Remote MCP Auth Test and a confirmed missing_data pattern in AI Moderator (4/4 runs). On the positive side, Agent Persona Explorer made a dramatic improvement: 165 turns yesterday → 14 turns today, and the Daily Issues Report Generator P1 was closed by the maintainer.
Performance Rankings
Top Performing Agents 🏆
Auto-Triage Issues (Q:88, E:92) — Copilot engine
CLI Version Checker (Q:83, E:86) — Claude engine
Lockfile Statistics Analysis Agent (Q:79, E:81) — Claude engine
Issue Monster (Q:78, E:80) — Copilot engine
Contribution Check (Q:77, E:81) — Copilot engine
Agents Needing Improvement 📉
AI Moderator (Q:63, E:65) — Codex engine
missing_data=1on all 4 runs todayGitHub Remote MCP Authentication Test (Q:N/A, E:30)
conclusion=failuretoday but agent callednoopsaying "test PASSED"Schema Consistency Checker (Q:68, E:69) — Claude engine
Documentation Unbloat (Q:71, E:75) — Claude engine
Duplicate Code Detector (Q:10, E:0) — Codex engine
Inactive / Resolved
Notable Patterns This Week
Positive Patterns ✅
Agent Persona Explorer improvement: 165 turns (Apr 6) → 14 turns today §24063075492. Dramatic efficiency gain. Created discussion #25009. Root cause unknown — possibly different issue input or prompt path. Monitor for consistency.
Lock file recompile (Apr 5-6): 17 stale → 0 stale. Systemic health improvement contributed to multiple workflows running clean.
Issue Monster recovery: Fully recovered from Apr 5
route.endpointerror. Stable baseline restored.Problematic Patterns⚠️
AI Moderator missing_data (all 4 runs): Codex agent reports
missing_data=1every run. The runs succeed but this indicates a systematic data access issue. See new issue (below).GitHub Remote MCP Auth false-negative: Agent passes the test internally but workflow exits as failure. Baseline comparison shows 95 turns → 3 turns (major behavioral change), suggesting the agent is completing too quickly without the expected execution path.
High-cost Claude workflows: Documentation Unbloat ($1.94), GitHub API Consumption Report ($1.66), Smoke Claude ($1.40 per successful run). Combined daily Claude cost could be reduced by moving data-fetching to deterministic pre-steps.
GitHub API & AI Consumption Report turn creep: Baseline 35 turns → 43 turns today (turn_increase classification). Monitor for continued growth.
Quality Analysis
Quality Score Distribution
Average quality score: 68/100 (↓1 from 69)
Common Quality Issues
Resource-heavy execution for domain (5 agents): Documentation Unbloat, GitHub API Report, Schema Checker, Agent Persona Explorer, Smoke Claude. All flagged by observability as consuming heavy profiles for their task shape.
Partially reducible to deterministic (5 agents): 85–96% of turns in several workflows are pure data-gathering. Moving these to pre-agent steps (frontmatter) would cut costs significantly.
Model over-provisioning (1 agent): Agent Persona Explorer flagged as needing only
gpt-4.1-miniorclaude-haiku-4-5for itsissue_responsedomain. A smaller model would reduce per-run cost.Effectiveness Analysis
Task Completion Statistics
Copilot (11 runs):
Claude (8 runs):
Codex (8 runs):
Safe Output Usage (7-day)
Recommendations
High Priority
Investigate AI Moderator missing_data pattern — Issue created this run
missing_data=1Fix GitHub Remote MCP Auth Test false-negative exit code — Existing issue [aw] GitHub Remote MCP Authentication Test failed #24829
noopwith success messagefailure— likely Copilot CLI exit code issue or activation failureMedium Priority
Reduce Documentation Unbloat cost (~$1.94/run)
Schema Consistency Checker monitoring (62 turns, slight regression)
Agent Persona Explorer model downgrade — Observability recommends
gpt-4.1-miniorclaude-haiku-4-5issue_response— doesn't require frontier modelLow Priority
GitHub API Consumption Report turn creep (35 → 43 turns)
Smoke Claude reliability (~30% failure rate ongoing)
Trends
Actions Taken This Run
References:
Beta Was this translation helpful? Give feedback.
All reactions