Skip to content

Agent Performance Report - Week of 2026-02-26 #18544

@github-actions

Description

@github-actions

Performance Summary

  • Agents analyzed: 16 (from 31 total runs sampled, past 2 days)
  • Total tokens (sample): ~165M (includes Codex high-parallelism runs)
  • Total cost (today): ~$5.94 | yesterday: ~$6.14
  • Average quality score: 86/100 (↓ 3 from 89)
  • Average effectiveness score: 87/100 (↓ 1 from 88)
  • Top performers: The Great Escapi, Contribution Check, Daily Safe Outputs Conformance Checker
  • Needs attention: AI Moderator (missing tool regression), Chroma Issue Indexer (extreme token usage), Semantic Function Refactoring (elevated cost)

Critical Findings

❌ P0 Ongoing: Lockdown Token Failures (3+ weeks)

4 workflows remain locked out — Issue Monster, PR Triage Agent, Daily Issues Report, Org Health Report. All fix paths closed (#17414, #17807 both rejected as "not_planned"). Manual repo admin intervention required. These failures continue to skew ecosystem quality metrics.

⚠️ AI Moderator GitHub MCP Missing Tool — Regression Detected

1 of 3 runs today (run §22453521501) reported missing GitHub MCP (read issue/comment content) tool — identical to the Docker MCP intermittency pattern last seen 2026-02-24 that was believed resolved by switching to mode: remote. With mode: remote now also showing intermittency, the root cause may be upstream GitHub MCP availability rather than Docker-specific. The other 2 runs succeeded but had very low turn counts (1–2 turns), which may indicate noop runs rather than full processing.

⚠️ Chroma Issue Indexer — Extreme Token Usage

Today's run consumed 3.6M tokens in 10.5 minutes with 102 blocked firewall requests — the highest blocked count of any workflow today. If the issue index is growing, this trend will worsen. The 47% firewall block rate across the ecosystem (439/926 requests blocked) is driven primarily by this workflow and Semantic Function Refactoring.

View Detailed Quality Analysis

Agent Quality Scores (Today)

Agent Engine Quality Duration Tokens Cost Notes
The Great Escapi copilot 94/100 3.5m 74k Ultra-efficient
Contribution Check copilot 93/100 2.8m 181k Fast, clean
Daily Safe Outputs Conformance Checker claude 92/100 3.1m 134k $0.33 Efficient
Auto-Triage Issues copilot 90/100 3.5m 136k Success
Agent Container Smoke Test copilot 90/100 4.4m 174k Clean
Smoke Copilot copilot 90/100 6.7m 49 turns, passing
Smoke Claude claude 87/100 12.9m 991k $1.47 42 turns, long
Lockfile Statistics Analysis Agent claude 87/100 5.0m 456k $0.82 14 turns, normal
AI Moderator (×3) codex 82/100 7.5–8.9m 210–372k 1/3 missing tool
Scout claude 80/100 4.9m 613k $0.81 19 turns
Smoke Codex codex 80/100 6.8m 32M 17 turns, Codex tokens
Slide Deck Maintainer copilot 78/100 6.7m 1.5M High tokens
Changeset Generator codex 75/100 8.2m 123M Codex parallelism
Semantic Function Refactoring claude 72/100 9.1m 295k $3.97 High cost, 12 turns
Chroma Issue Indexer copilot 68/100 10.5m 3.6M Extreme tokens

Cancelled Runs Analysis

14 runs were cancelled in a batch (runs 22450833xxx–22450834xxx). This is expected behavior from a Release workflow trigger — these represent staggered workflow starts that were cancelled before the new release artifacts were ready. Not a quality issue.

View Effectiveness Metrics

Task Completion Rates (Sampled Agent Runs)

  • High completion (>80%): 13/15 agent workflows (87%)
  • Partial/Degraded: AI Moderator (1/3 runs degraded), Chroma Issue Indexer (functional but inefficient)
  • Infrastructure failures (not quality): Issue Monster, PR Triage Agent, Daily Issues Report, Org Health Report (lockdown)

Cost Efficiency Trends

Agent Today Yesterday Δ
Semantic Function Refactoring $3.97 $4.82 ↓ $0.85 ✅
Scout $0.81 New data point
Daily Safe Outputs Conformance Checker $0.33 Consistent
Lockfile Statistics Analysis Agent $0.82 Consistent
Smoke Claude $1.47 Long duration
Total (metered) $5.94 $6.14 ↓ $0.20 ✅

Firewall Request Analysis

Total 926 requests across all workflows: 487 allowed (53%), 439 blocked (47%).

Top blocked workflows:

  1. Chroma Issue Indexer: 102 blocked — likely local socket connections (Serena MCP pattern)
  2. Semantic Function Refactoring: 72 blocked — consistent with "-" domain pattern
  3. Changeset Generator: 61 blocked — Codex parallelism reaching out broadly
  4. Slide Deck Maintainer: 43 blocked — investigating
  5. Smoke Codex: 38 blocked — expected for engine behavior

The "-" domain appearing in blocked list is a known Serena MCP local socket artifact (see issue #18388).

View Behavioral Patterns

Productive Patterns ✅

  • Release → Smoke cancellation → Re-run: Expected orchestration behavior, not a failure
  • Daily Safe Outputs Conformance Checker: Continues to be highly efficient (3 turns, $0.33)
  • The Great Escapi: Maintaining minimal footprint, high reliability across 2+ weeks

Problematic Patterns ⚠️

  • AI Moderator GitHub MCP intermittency: 3rd occurrence of missing tool issue. Pattern: mode: remote was supposed to fix this (2026-02-24), but 1/3 runs today missing GitHub MCP again. Silent failures — moderation trigger runs but does nothing. Impact: ~33% of moderation events missed.
  • Semantic Function Refactoring high cost: 12th consecutive day of elevated cost. Despite slight improvement ($4.82→$3.97), still 12× more expensive than most claude workflows. Root cause under investigation via issue [refactor] Semantic Function Clustering Analysis: Misplaced Functions and Duplicate Patterns in pkg/workflow #18388.
  • Chroma Issue Indexer token growth: 3.6M tokens is abnormally high for an issue indexer. If the issue backlog is growing, this will continue to scale up linearly. No issue yet created.
  • Codex extreme token counts: Changeset Generator (123M) and Smoke Codex (32M) show Codex engine's parallel-context behavior. Not quality issues but skew overall token metrics significantly.

Ecosystem Coverage Assessment

  • ✅ Security: The Great Escapi active and efficient
  • ✅ Code quality: Smoke tests (Copilot/Claude/Codex) passing on main
  • ✅ Documentation: Slide Deck Maintainer running (high tokens, worth monitoring)
  • ✅ Release: Workflow completed successfully today
  • ⚠️ Issue triage: AI Moderator intermittent (33% miss rate today)
  • ❌ Issue monitoring: Issue Monster, Daily Issues Report locked out

Recommendations

High Priority

  1. Investigate AI Moderator GitHub MCP reliability — 3rd incident in a week

    • The 1/3 miss rate today suggests mode: remote is not a reliable fix
    • Consider: adding retry logic, fallback to mode: local if remote unavailable, or alert on noop runs
    • Affected run: §22453521501
  2. Chroma Issue Indexer token usage investigation — 3.6M tokens is a new high

    • Determine if issue backlog growth is expected or indicates runaway indexing
    • 102 blocked firewall requests also the highest in ecosystem — understand what it's attempting to reach
    • Consider creating issue to track and cap maximum tokens per run

Medium Priority

  1. Semantic Function Refactoring cost — Slight improvement ($3.97) but still high

  2. Lockdown P0 escalation — All programmatic fix paths closed ([P1] Lockdown mode failing: GH_AW_GITHUB_TOKEN not configured — 5 workflows affected #17414, [q] fix(workflows): remove explicit lockdown:true to stop recurring failures #17807 both "not_planned")

    • 4 workflows generating failure noise daily
    • Recommend direct escalation to repository maintainers (not via issue)

Low Priority

  1. Smoke Claude duration — 12.9m and 42 turns is the longest smoke test
    • All other smokes complete in <7m — investigate if Smoke Claude is testing more or stuck in retry loops

Trends (7-day)

  • Agent quality: 86/100 (↓ from 89 — AI Moderator regression and Chroma concern)
  • Total metered cost: $5.94 (↓ from $6.14 — small improvement)
  • Firewall block rate: 47% (stable/elevated — "-" domain artifacts persist)
  • Smoke test health: ✅ All passing on main
  • Lockdown failures: 4 workflows (→ unchanged, 3+ weeks)

Actions Taken This Run

  • Updated agent-performance-latest.md in shared repo memory
  • Updated shared-alerts.md with AI Moderator regression and Chroma concern
  • Generated this performance report discussion

Analysis period: 2026-02-25 → 2026-02-26
Next report: 2026-02-27
References: §22453850435 | §22408567616 | §22453521501


Warning

This was intended to be a discussion, but discussions could not be created due to permissions issues. This issue was created as a fallback.

Discussion creation may fail if the specified category is not announcement-capable. Consider using the "Announcements" category or another announcement-capable category in your workflow configuration.

Generated by Agent Performance Analyzer - Meta-Orchestrator

  • expires on Feb 27, 2026, 5:48 PM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions