Skip to content

P1 Systemic: Multiple workflows failing with "engine terminated unexpectedly" (exit code 1) — Apr 8 #25292

@github-actions

Description

@github-actions

Overview

20 workflow failure issues were created on Apr 8, 2026 — a significant spike. At least 13 workflows share a common failure pattern: the engine terminates with exit code 1 after the firewall containers stop cleanly. This spans both copilot and claude engines.

Failure Pattern

The common log signature is:

Container awf-squid  Removing
Container awf-squid  Removed
[SUCCESS] Containers stopped successfully
[INFO] Agent session state preserved at: /tmp/awf-agent-session-state-...
[INFO] API proxy logs available at: /tmp/gh-aw/sandbox/firewall/logs/api-proxy-logs
[WARN] Command completed with exit code: 1
Process exiting with code: 1

The container/firewall infrastructure teardown succeeds, but the agent process exits with code 1, causing GitHub Actions to report the workflow as failed.

Affected Workflows (Copilot engine, exit code 1)

Issue Workflow Time (UTC)
#25287 Dead Code Removal Agent 12:04
#25282 Daily Copilot Token Usage Audit 11:55
#25277 Daily Testify Uber Super Expert 11:39
#25268 Copilot PR Conversation NLP Analysis 10:43
#25267 Daily Syntax Error Quality Check 10:41
#25262 Daily MCP Tool Concurrency Analysis 09:36
#25261 Dev 09:05
#25260 Architecture Diagram Generator 09:03
#25249 Daily CLI Performance Agent 05:45
#25243 Copilot PR Prompt Pattern Analysis 05:02
#25236 Agent Performance Analyzer 04:37
#25215 Auto-Triage Issues 01:02

Also affected (Claude engine):

Timeline & Possible Causes

Failures span 01:02–12:04 UTC, beginning before any code changes landed today:

Time Event
01:02 First failures (Auto-Triage, Smoke tests) — before any commits
04:37 Agent Performance Analyzer (Copilot) — still before commits
04:46 feat: repo-level config via aw.json merged
05:05 chore: bump firewall v0.25.16 + full recompile
05:57 feat: add pre-steps to agent job merged
06:08 chore: Copilot CLI 1.0.20→1.0.21 merged
06:54 CI Cleaner (Claude engine) fails
09:03–12:04 Wave of Copilot engine failures

Since failures started before any commits, this may be an infrastructure or GitHub Actions Copilot runner issue. The post-05:05 failures may or may not be related to the code changes.

Possible Causes to Investigate

  1. GitHub Copilot runner infrastructure: Agent process failing silently (exit code 1) without useful error output — check if there's a known outage or degradation
  2. Firewall v0.25.16: The recompile happened at 05:05 — check if new firewall version blocks something the agent requires
  3. Copilot CLI 1.0.21: New version (06:08) — check release notes for breaking changes that could cause exit code 1
  4. pre-steps feature: New same-job token minting (05:57) — may affect auth flow for some workflows

Investigation Steps

  1. Download agent-stdio.log from a failed run (e.g., §24134231123) to see what the Copilot agent logged before exit
  2. Check GitHub Status page for Copilot runner issues
  3. Compare a failing workflow's lock file (pre vs post firewall bump) to see if behavior changed
  4. Check if Copilot CLI 1.0.21 has known exit code 1 issues

Priority

P1 — Affects 13+ workflows including core agentic workflows (Dev, Agent Performance Analyzer, Auto-Triage Issues).

Detected by Workflow Health Manager meta-orchestrator (§24134411505).

Generated by Workflow Health Manager - Meta-Orchestrator · ● 1.6M ·

  • expires on Apr 9, 2026, 12:15 PM UTC

Metadata

Metadata

Labels

cookieIssue Monster Loves Cookies!

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions