Agent Performance Report — Week of April 7, 2026 #25024

2026-04-07T05:00:17Z

github-actions[bot]
bot Apr 7, 2026

Executive Summary

Metric	Value	Trend
Overall Quality Score	68/100	↓1 from 69
Overall Effectiveness Score	67/100	↓1 from 68
Runs analyzed (7-day)	27	copilot:11, claude:8, codex:8
Success rate	81% (22/27)	stable
Claude cost (7-day)	$9.38	8 runs, 16.1M tokens
Top performers	Auto-Triage, CLI Version Checker, Lockfile Stats	stable
Needs attention	AI Moderator (missing_data), GitHub Remote MCP Auth Test	new pattern
Resolved P1	Daily Issues Report Generator (#24703 closed "not_planned")	✅

The slight score decline is driven by a new false-negative failure pattern in the GitHub Remote MCP Auth Test and a confirmed missing_data pattern in AI Moderator (4/4 runs). On the positive side, Agent Persona Explorer made a dramatic improvement: 165 turns yesterday → 14 turns today, and the Daily Issues Report Generator P1 was closed by the maintainer.

Performance Rankings

Top Performing Agents 🏆

Auto-Triage Issues (Q:88, E:92) — Copilot engine
- Two successful runs today, both completed in 4.2m with only 3 turns each
- ~110K tokens/run — lean and efficient
- Consistent, reliable, low-noise
- Example runs: §24063720138, §24059965030
CLI Version Checker (Q:83, E:86) — Claude engine
- Completed in 6m, 23 turns, 762K tokens — proportionate to task
- Clean execution, no errors
- Example run: §24064014747
Lockfile Statistics Analysis Agent (Q:79, E:81) — Claude engine
- 9.3m, 29 turns, 1.5M tokens — deep analysis done systematically
- Good quality output for a data-intensive task
- Example run: §24057327691
Issue Monster (Q:78, E:80) — Copilot engine
- Confirmed recovery from Apr 5 regression; 4 turns, 158K tokens
- Stable and back to normal baseline
- Example run: §24062458976
Contribution Check (Q:77, E:81) — Copilot engine
- 4.5m, 7 turns, 308K tokens — thorough PR review at reasonable cost
- Intermittent errors observed Apr 3 (50% rate) appear to have resolved
- Example run: §24059442987

Agents Needing Improvement 📉

AI Moderator (Q:63, E:65) — Codex engine
- Pattern: missing_data=1 on all 4 runs today
- Runs succeed but consistently report a missing data event
- Root cause unknown — may indicate expected data source not available at runtime
- Action: Issue created (see below)
GitHub Remote MCP Authentication Test (Q:N/A, E:30)
- Workflow conclusion=failure today but agent called noop saying "test PASSED"
- False-negative pattern: MCP authentication actually works (5 successful GitHub API calls made), but agent job exits non-zero
- Existing issue [aw] GitHub Remote MCP Authentication Test failed #24829 tracks this; pattern persisting into Apr 7
- Distinct from actual authentication failure — this is a framework exit code issue
Schema Consistency Checker (Q:68, E:69) — Claude engine
- 62 turns today — slight regression from 55 turns yesterday
- Still improving from the 114-turn peak; overall trend positive but ticked up
- Needs continued monitoring
- Example run: §24064262622
Documentation Unbloat (Q:71, E:75) — Claude engine
- High cost: $1.94/run, 53 turns, 3.98M tokens
- 100% firewall block rate (5 requests blocked) — caused by example.org/placeholder URLs in docs being navigated by Playwright; not a real network concern but counts against metrics
- 96% of turns are data-gathering — good candidate for deterministic prefetching
- Created 1 PR successfully
- Example run: §24063066748
Duplicate Code Detector (Q:10, E:0) — Codex engine
- Still blocked by Codex API restriction (issue [aw] Duplicate Code Detector failed #24718)
- Externally blocked — no action available from workflow side

Inactive / Resolved

Daily Issues Report Generator: Issue [aw] Daily Issues Report Generator failed #24703 CLOSED by maintainer as "not_planned" on Apr 6 — removed from P1 tracking
Issue Monster: Fully recovered — removed from watch list

Notable Patterns This Week

Positive Patterns ✅

Agent Persona Explorer improvement: 165 turns (Apr 6) → 14 turns today §24063075492. Dramatic efficiency gain. Created discussion #25009. Root cause unknown — possibly different issue input or prompt path. Monitor for consistency.
Lock file recompile (Apr 5-6): 17 stale → 0 stale. Systemic health improvement contributed to multiple workflows running clean.
Issue Monster recovery: Fully recovered from Apr 5 route.endpoint error. Stable baseline restored.

Problematic Patterns ⚠️

AI Moderator missing_data (all 4 runs): Codex agent reports missing_data=1 every run. The runs succeed but this indicates a systematic data access issue. See new issue (below).
GitHub Remote MCP Auth false-negative: Agent passes the test internally but workflow exits as failure. Baseline comparison shows 95 turns → 3 turns (major behavioral change), suggesting the agent is completing too quickly without the expected execution path.
High-cost Claude workflows: Documentation Unbloat ($1.94), GitHub API Consumption Report ($1.66), Smoke Claude ($1.40 per successful run). Combined daily Claude cost could be reduced by moving data-fetching to deterministic pre-steps.
GitHub API & AI Consumption Report turn creep: Baseline 35 turns → 43 turns today (turn_increase classification). Monitor for continued growth.

Quality Analysis

Quality Score Distribution

Band	Agents	Count
Excellent (80–100)	Auto-Triage Issues, CLI Version Checker	2
Good (70–79)	Lockfile Stats, Issue Monster, Contribution Check, Daily Observability Report, Documentation Unbloat, Agent Persona Explorer	6
Fair (60–69)	Schema Checker, AI Moderator, GitHub API Report, Daily Safe Output Optimizer	4
Poor (<40)	Duplicate Code Detector	1

Average quality score: 68/100 (↓1 from 69)

Common Quality Issues

Resource-heavy execution for domain (5 agents): Documentation Unbloat, GitHub API Report, Schema Checker, Agent Persona Explorer, Smoke Claude. All flagged by observability as consuming heavy profiles for their task shape.
Partially reducible to deterministic (5 agents): 85–96% of turns in several workflows are pure data-gathering. Moving these to pre-agent steps (frontmatter) would cut costs significantly.
Model over-provisioning (1 agent): Agent Persona Explorer flagged as needing only gpt-4.1-mini or claude-haiku-4-5 for its issue_response domain. A smaller model would reduce per-run cost.

Effectiveness Analysis

Task Completion Statistics

Copilot (11 runs):

Completed successfully: 7
In progress / unknown: 3 (current run + 2 in-progress at collection time)
Failed: 1 (GitHub Remote MCP Auth Test)
3.5M total tokens, 67 turns, 52 GitHub API calls

Claude (8 runs):

Completed successfully: 7
In progress: 1 (Smoke Claude at collection time)
$9.38 total cost, 283 turns, 1.7h wall time
Cache efficiency: 44–99.9% (most workflows benefit heavily from prompt caching)

Codex (8 runs):

Completed successfully: 7
In progress: 1 (Smoke Codex at collection time)
4 missing_data events (AI Moderator x4)

Safe Output Usage (7-day)

Type	Count	Workflows
create_pull_request	1	Documentation Unbloat
create_discussion	3	GitHub API Report, Agent Persona Explorer, (previous week)
noop	7	Various (including GitHub Remote MCP Auth)
Total safe items	11	—

Recommendations

High Priority

Investigate AI Moderator missing_data pattern — Issue created this run
- All 4 runs on Apr 7 show missing_data=1
- Likely a data source (cache-memory, API endpoint, or input context) that's unavailable at Codex agent startup
- Estimated effort: 1–2h investigation
- Expected improvement: Eliminate systematic missing_data on every run
Fix GitHub Remote MCP Auth Test false-negative exit code — Existing issue [aw] GitHub Remote MCP Authentication Test failed #24829
- Agent internally passes test and calls noop with success message
- Workflow exits as failure — likely Copilot CLI exit code issue or activation failure
- Baseline behavior (95 turns) vs today (3 turns) shows execution path has changed significantly
- Estimated effort: 2–4h

Medium Priority

Reduce Documentation Unbloat cost (~$1.94/run)
- Move file-listing and git status operations to deterministic pre-agent steps
- Avoid navigating example.org/placeholder URLs (add URL validation before Playwright calls)
- Estimated savings: 30–50% cost reduction (potentially $0.60–1.00/run)
- Expected improvement: +5–10 quality points, lower cost
Schema Consistency Checker monitoring (62 turns, slight regression)
- Monitor next 3 runs — if above 60 turns consistently, trigger prompt review
- No immediate action required; overall trend still improving from 114 peak
Agent Persona Explorer model downgrade — Observability recommends gpt-4.1-mini or claude-haiku-4-5
- Task domain: issue_response — doesn't require frontier model
- Would reduce per-run cost for this high-frequency daily workflow

Low Priority

GitHub API Consumption Report turn creep (35 → 43 turns)
- Monitor next 3 runs for continued increase
- If trend continues, review prompt for data-fetching efficiency
Smoke Claude reliability (~30% failure rate ongoing)
- Not new — persistent. Track as long-term health metric.

Trends

Metric	This Week	Last Week	Direction
Overall quality	68/100	69/100	↓1
Overall effectiveness	67/100	68/100	↓1
P1 issues active	1 (Duplicate Code Detector)	2 (#24703, #24718)	↓1 ✅
Stale lock files	0	17→0 (fixed)	stable ✅
Agent Persona Explorer turns	14	165	↓151 ✅
Schema Checker turns	62	55	↑7 ⚠️
AI Moderator missing_data	4/4	unknown	new pattern
Claude 7d cost	$9.38	~$8–9	stable

Actions Taken This Run

Created 1 improvement issue: AI Moderator missing_data pattern
Commented on existing issue [aw] GitHub Remote MCP Authentication Test failed #24829 (GitHub Remote MCP Auth false-negative)
Confirmed closure of P1 issue [aw] Daily Issues Report Generator failed #24703 (Daily Issues Report Generator)
Observed Agent Persona Explorer dramatic improvement (165→14 turns)

References:

§24063066748 — Documentation Unbloat
§24063997398 — GitHub Remote MCP Auth Test (failure)
§24060244163 — GitHub API & AI Consumption Report Agent

Analysis period: 2026-03-31 to 2026-04-07 | Next report: 2026-04-14

Generated by Agent Performance Analyzer - Meta-Orchestrator · ● 3M · ◷

expires on Apr 8, 2026, 5:00 AM UTC

2026-04-08T05:20:56Z

github-actions[bot]
bot Apr 8, 2026
Author

This discussion was automatically closed because it expired on 2026-04-08T05:00:17.544Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Performance Report — Week of April 7, 2026 #25024

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Agent Performance Report — Week of April 7, 2026 #25024

Uh oh!

github-actions[bot] bot Apr 7, 2026

Executive Summary

Performance Rankings

Top Performing Agents 🏆

Agents Needing Improvement 📉

Inactive / Resolved

Notable Patterns This Week

Positive Patterns ✅

Problematic Patterns ⚠️

Quality Analysis

Effectiveness Analysis

Recommendations

High Priority

Medium Priority

Low Priority

Trends

Actions Taken This Run

Replies: 1 comment

Uh oh!

github-actions[bot] bot Apr 8, 2026 Author

github-actions[bot]
bot Apr 7, 2026

github-actions[bot]
bot Apr 8, 2026
Author