[observability] Agentic Observability Report — 2026-04-06 #24843

2026-04-06T08:56:49Z

github-actions[bot]
bot Apr 6, 2026

Executive Summary

This report covers 2026-04-06 (single-day window — no runs were available from the prior 13 days, suggesting this is an early production day or a newly activated repository). All 203 runs resolve to 203 standalone episodes with high confidence and no orchestrator-worker chains detected (zero edges in the lineage graph). No runs were classified as risky. Total spend was $5.56 across 31.8M tokens and 449 GitHub Actions minutes.

The two most operationally significant signals are:

Smoke Copilot and Smoke Claude — repeated high-severity resource_heavy_for_domain and poor_agentic_control across 3 runs each today. Smoke tests are expected to be lightweight; this pattern is a clear regression from what smoke testing should look like.
GitHub Remote MCP Authentication Test — 100% failure rate across both runs today.

No runs were classified as risky by the deterministic model, and no episodes were flagged escalation_eligible in the log metadata. However, Smoke Copilot and Smoke Claude both cross the workflow-level fallback escalation thresholds (≥2 runs with medium/high resource_heavy_for_domain and ≥2 runs with medium/high poor_agentic_control).

Key Metrics

Metric	Value
Date range analyzed	2026-04-06 only
Distinct workflows	~35
Total runs	203
Episodes analyzed	203
High-confidence episodes	203 (100%)
`risky` classified runs	0
`resource_heavy_for_domain` high/medium	19 runs, 16 workflows
`poor_agentic_control` medium/high	12 runs, 8 workflows
`overkill_for_agentic` (any severity)	50+ runs across ~12 workflows
Failed runs	11 (5.4%) across 8 workflows
Total tokens	31,778,550
Total estimated cost	$5.56
Total action minutes	449
Write-capable episodes	10
Orchestrator-worker chains	0 (flat topology)
`latest_success` fallbacks	0 (most runs had no comparison baseline)

Highest Risk Episodes

🔴 Smoke Copilot — Repeated resource-heavy + poor control (3 runs)

Smoke tests are intended to be lean validation checks. Across 3 runs today, all three triggered resource_heavy_for_domain (2 high, 1 medium) and poor_agentic_control (1 high, 2 medium).

Run ID	Tokens	Resource Heavy	Poor Control
§24016631769	987,748	high	high
§24016762986	1,321,781	high	medium
§24018427871	675,373	medium	high

Pattern: Token usage varies 675K–1.3M per smoke run. A smoke test should use far fewer tokens. poor_agentic_control at high severity in run 24016631769 suggests the agent may be backtracking or repeating tool calls. Crosses escalation thresholds for both resource_heavy_for_domain and poor_agentic_control.

🔴 Smoke Claude — Repeated resource-heavy (3 runs, 2 with poor control)

All 3 Smoke Claude runs today were flagged resource_heavy_for_domain: high. Two also carried poor_agentic_control: medium.

Run ID	Tokens	Resource Heavy	Poor Control
§24016631773	1,077,182	high	—
§24016762959	1,733,152	high	medium
§24016851157	1,344,274	high	medium

Pattern: Claude smoke runs consume 1–1.7M tokens each. The medium poor_agentic_control in two runs indicates agent behavior is not well-scoped. Crosses escalation thresholds for resource_heavy_for_domain (3 high-severity runs) and poor_agentic_control (2 medium-severity runs).

🟡 GitHub Remote MCP Authentication Test — 100% failure rate

Both runs today failed. The first run consumed 98,690 tokens before failing (§24018502099); the second had no token activity (§24018928797), suggesting the failure occurs before agent invocation. This is a systematic issue, not a transient fault.

Episode Regressions

All 203 episodes are first-time observations (single-day window, no historical baseline available for most workflows). Notable single-run high-consumption episodes with no cohort to compare against:

Workflow	Run ID	Tokens	Key Assessments
Agent Persona Explorer	§24017802449	4,150,346	resource_heavy:high, poor_control:medium
Agent Performance Analyzer (Meta-Orchestrator)	§24018835783	2,628,931	resource_heavy:high
Layout Specification Maintainer	§24023974662	2,545,617	resource_heavy:high, poor_control:medium
Code Simplifier	§24021117433	2,195,742	resource_heavy:high
Schema Consistency Checker	§24018746491	1,783,522	resource_heavy:high, poor_control:medium, cost=$1.04

These are single-run baselines. They should be re-evaluated once cohort data accumulates.

Recommended Actions

[Immediate] Investigate Smoke Copilot and Smoke Claude token inflation — both workflows cross escalation thresholds. Review whether the smoke workflow prompt or the engine configuration is causing the agent to do substantive work instead of a lightweight validation. A smoke test should use <100K tokens, not 1M+.
[Immediate] Fix GitHub Remote MCP Authentication Test — 100% failure across 2 runs today. The zero-token second failure suggests a pre-agent configuration or authentication problem. Check GitHub Remote MCP credentials/secrets and network firewall settings.
[Soon] Review Auto-Triage Issues — 1 of 3 runs today consumed 1.1M tokens with high resource_heavy_for_domain and medium poor_agentic_control. The other two had zero tokens. Inconsistent activation suggests ambiguous trigger conditions.
[Portfolio] Consider deterministic alternatives for overkill workflows — ~50+ runs today across ~12 workflows (Archie, /cloclo, Q, Scout, Resource Summarizer Agent, Poem Bot, Workflow Craft Agent, etc.) are repeatedly assessed as overkill_for_agentic: low. Most were skipped this run. These are candidates for deterministic GitHub Actions rewrites or removal.
[Monitor] Watch for Changeset Generator and Smoke Codex failures — both had 2 failures today. Single-day view limits causal analysis; check if CI changes or broken dependencies are the root cause.

Full Workflow Inventory — All 203 Runs

Failures (11 total)

Workflow	Run ID	Errors	Notes
Changeset Generator	§24016631772	1	First failure
Smoke Codex	§24016631774	1	—
Changeset Generator	§24016762950	1	Second failure
Smoke Codex	§24016762967	1	—
Documentation Unbloat	§24017778479	1	—
GitHub Remote MCP Authentication Test	§24018502099	1	98K tokens consumed
GitHub Remote MCP Authentication Test	§24018928797	1	Zero tokens
AI Moderator	§24019350098	1	—
Contribution Check	§24020210365	1	Also resource_heavy:high
CI Cleaner	§24022270848	1	—
Schema Feature Coverage Checker	§24022436614	1	—

Resource Heavy — High Severity (19 runs)

Workflows with resource_heavy_for_domain: high (most are single-run, no comparison baseline yet):

Smoke Claude (×3), Smoke Copilot (×2), Auto-Triage Issues (×1), Release (×1), CLI Version Checker (×1), Schema Consistency Checker (×1), jsweep (×1), Agent Performance Analyzer Meta-Orchestrator (×1), Daily CLI Tools Exploratory Tester (×1), Contribution Check (×1), Daily CLI Performance Agent (×1), Code Simplifier (×1), PR Triage Agent (×1), Layout Specification Maintainer (×1), Go Fan (×1), Agent Persona Explorer (×1).

Poor Agentic Control — Medium/High Severity (12 runs)

Smoke Copilot (×3), Smoke Claude (×2), Agent Persona Explorer (×1), Schema Consistency Checker (×1), Contribution Check (×1), PR Triage Agent (×1), Auto-Triage Issues (×1), Layout Specification Maintainer (×1), Go Fan (×1).

Overkill Candidates (portfolio cleanup)

Consistently overkill_for_agentic: low across all runs — candidates for deterministic rewrites:
Archie, /cloclo, Q, Scout, Resource Summarizer Agent, Poem Bot - A Creative Agentic Workflow, Workflow Craft Agent, ACE Editor Session, Documentation Unbloat, Plan Command, Mergefest, Dev.

Many of these are scheduled and skip-on-no-op patterns that rarely have substantive work to do.

Lineage Observations

Zero edges in the lineage graph — no orchestrator-worker chains were detected.
All episodes are standalone with confidence: high.
The observability system flagged 4 high-anomaly events (score >0.6) in cross-run template analysis, attributed to "new log template discovered; rare cluster." This is expected for a first-day dataset.

Cost Breakdown — Top 10 Token Consumers

Workflow	Run ID	Tokens	Est. Cost
Agent Persona Explorer	§24017802449	4,150,346	—
Agent Performance Analyzer	§24018835783	2,628,931	—
Layout Specification Maintainer	§24023974662	2,545,617	—
Code Simplifier	§24021117433	2,195,742	—
Schema Consistency Checker	§24018746491	1,783,522	$1.04
Smoke Claude	§24016762959	1,733,152	—
Daily CLI Tools Exploratory Tester	§24019945745	1,731,614	—
Go Fan	§24024581430	1,571,139	$1.15
jsweep - JavaScript Unbloater	§24018753434	1,403,830	—
Smoke Claude	§24016851157	1,344,274	—

Note: Only 2 runs show non-zero estimated cost ($5.56 total). The remaining cost is likely from runs where cost tracking was not available.

References:

§24016631769 — Smoke Copilot (resource_heavy:high, poor_control:high)
§24016762959 — Smoke Claude (resource_heavy:high, poor_control:medium)
§24018928797 — GitHub Remote MCP Authentication Test (failure)

Generated by Agentic Observability Kit · ● 1.8M · ◷

expires on Apr 13, 2026, 8:56 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[observability] Agentic Observability Report — 2026-04-06 #24843

Uh oh!

{{title}}

Uh oh!

Failures (11 total)

Resource Heavy — High Severity (19 runs)

Poor Agentic Control — Medium/High Severity (12 runs)

Overkill Candidates (portfolio cleanup)

Lineage Observations

Replies: 0 comments

Select a reply

Uh oh!

[observability] Agentic Observability Report — 2026-04-06 #24843

Uh oh!

github-actions[bot] bot Apr 6, 2026

Executive Summary

Key Metrics

Highest Risk Episodes

🔴 Smoke Copilot — Repeated resource-heavy + poor control (3 runs)

🔴 Smoke Claude — Repeated resource-heavy (3 runs, 2 with poor control)

🟡 GitHub Remote MCP Authentication Test — 100% failure rate

Episode Regressions

Recommended Actions

Failures (11 total)

Resource Heavy — High Severity (19 runs)

Poor Agentic Control — Medium/High Severity (12 runs)

Overkill Candidates (portfolio cleanup)

Lineage Observations

Replies: 0 comments

github-actions[bot]
bot Apr 6, 2026