fix(ci-doctor): pre-download logs and artifacts, apply generic error heuristics to reduce token usage by Copilot · Pull Request #17719 · github/gh-aw

Copilot · 2026-02-22T13:32:20Z

CI Failure Doctor was consuming 1.5M tokens/run because the agent loaded full log payloads into conversation context, then burned 5–8 LLM turns running Python grep scripts to locate the actual error.

Changes

Pre-download step (frontmatter steps:)
Runs before the agent session starts — zero extra agent turns:

Fetches failed jobs + their failed step names via gh api → failed-jobs.json
Downloads each failed job's full log → /tmp/ci-doctor/logs/job-<id>.log
Downloads and unpacks all workflow run artifacts via gh run download → /tmp/ci-doctor/artifacts/ (auto-extracted by artifact name)
Applies generic grep heuristics (case-insensitive: error:, FAIL, panic:, fatal:, undefined:, exception, non-zero exit status) to both log files and artifact text files (.txt, .log, .json, .xml, .out, .err) → filtered hint files in /tmp/ci-doctor/filtered/*-hints.txt
Writes /tmp/ci-doctor/summary.txt enumerating all file locations and first 3 hint matches per file

Directory layout:

Path	Contents
`/tmp/ci-doctor/logs/`	Job log files + `failed-jobs.json`
`/tmp/ci-doctor/artifacts/`	Unpacked artifact files (by artifact name)
`/tmp/ci-doctor/filtered/`	Hint files for both logs and artifacts
`/tmp/ci-doctor/summary.txt`	Starting point listing all locations + first hints

Agent instructions

New Pre-Analysis Data section directs the agent to summary.txt as the starting point
Phase 2 revised: read hint files first, inspect ±10 lines around hinted line numbers, check artifacts for structured output, only load the full log if hints are insufficient
Fallback get_job_logs reduced from tail_lines=200 → tail_lines=100

The heuristics are intentionally generic (no job-name-specific patterns) so they work across any CI job type and artifact format.

Security: context expressions (${{ github.event.workflow_run.id }}) are passed via env: vars rather than interpolated directly in the shell script to prevent template injection.

Original prompt

This section details on the original issue you should resolve

<issue_title>[q] fix(ci-doctor): add error pre-location heuristics to reduce token usage</issue_title>
<issue_description>## Q Workflow Optimization Report

Issues Found (from live data)

CI Failure Doctor was consuming 1.5M tokens per run (10 runs analyzed, 8.7M total tokens over 7 days).

Root cause analysis of runs 22260230861 (1.55M tokens) and 22260291762 (1.47M tokens):

The agent called get_job_logs which returned oversized MCP payloads (flagged as "too large for MCP response"), causing the Copilot agent to iterate over them with multiple Python parsing scripts before finding the actual error
tail_lines=200 was retrieving large verbose CI log sections when the error was typically in 2–3 lines
No lightweight pre-triage step: the agent had no way to classify the failure type before loading logs, so it loaded everything and searched iteratively
Typical turn pattern: get_workflow_run → list_workflow_jobs → get_job_logs (large payload) → 5–8 Python grep scripts → error found

Changes Made

.github/workflows/ci-doctor.md

Added Phase 1.5: Error Pre-Location Heuristics — a new phase inserted between Phase 1 (triage) and Phase 2 (log analysis) that:
- Extracts the exact step name(s) that failed from list_workflow_jobs (zero extra API calls)
- Classifies failure type from job name heuristics (e.g., lint-go → grep for unused, SA; test-unit → grep for --- FAIL; build → grep for undefined:)
- Builds a targeted grep pattern before any log download
- Sets a decision gate: tail_lines=50 when failure type is clear, tail_lines=100 when ambiguous
Reduced tail_lines from 200 → 100 in Phase 2 for initial log retrieval
Added instruction to grep-extract error context (10 lines before/after the match) instead of loading the full log payload into conversation context

Expected Improvements

Token reduction estimate: ~40–60% fewer tokens per run by avoiding iterative Python parsing of large payload files
Targeted grep patterns mean the error is found in 1–2 shell commands instead of 5–8 Python scripts
tail_lines=50 for clear-cut failures (lint, build) reduces MCP payload size significantly

Validation

✅ ci-doctor.md compiled successfully with gh aw compile

Note: .lock.yml will be regenerated automatically after merge.

References

Run IDs investigated: 22260230861 (1.55M tokens), 22260291762 (1.47M tokens), 22260087423 (876K tokens)
10-run aggregate: 8.7M total tokens across successful CI Doctor runs
Agent log showing iterative Python parsing: /tmp/gh-aw/aw-mcp/logs/run-22260230861/agent-stdio.log

🎩 Equipped by Q

expires on Feb 24, 2026, 1:23 PM UTC

[!NOTE]
This was originally intended as a pull request, but the git push operation failed.

Workflow Run: View run details and download patch artifact

The patch file is available in the agent-artifacts artifact in the workflow run linked above.

To apply the patch locally:

# Download the artifact from the workflow run https://github.com/github/gh-aw/actions/runs/22277901060
# (Use GitHub MCP tools if gh CLI is not available)
gh run download 22277901060 -n agent-artifacts -D /tmp/agent-artifacts-22277901060

# The patch file will be at agent-artifacts/tmp/gh-aw/aw-q-ci-doctor-token-optimization.patch after download
# Apply the patch
git am /tmp/agent-artifacts-22277901060/aw-q-ci-doctor-token-optimization.patch

Show patch preview (60 of 60 lines)

From 251a0692dbf88c9f343939a6660adf73781fd77f Mon Sep 17 00:00:00 2001
From: Copilot <copilot@github.com>
Date: Sun, 22 Feb 2026 13:22:03 +0000
Subject: [PATCH] fix(ci-doctor): add pre-location heuristics to reduce token
 usage

- Add Phase 1.5: Error Pre-Location Heuristics that classifies failure
  type from job/step names before downloading any logs
- Build targeted grep patterns per failure type (lint-go, test, build,
  validate-yaml) to extract only relevant error lines
- Reduce tail_lines from 200 to 100 for initial log retrieval
- Add decision gate: 50 lines for clear failures, 100 for ambiguous
- Instruct agent to grep-extract error context instead of loading full
  log into conversation context

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/workflows/ci-doctor.md | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/ci-doctor.md b/.github/workflows/ci-doctor.md
inde...

</details>

<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes github/gh-aw#17711

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for you](https://github.com/github/gh-aw/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

…reduce token usage Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan

@copilot update step to download and unpack all files and artifacts to files, then produce filtered files as well.

…to artifact files Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-02-22T13:52:45Z

@copilot update step to download and unpack all files and artifacts to files, then produce filtered files as well.

Done in 046df74. The step now:

Downloads and unpacks all workflow run artifacts to /tmp/ci-doctor/artifacts/ via gh run download (auto-extracts by artifact name)
Applies the same generic grep heuristics to artifact text files (.txt, .log, .json, .xml, .out, .err) → filtered hint files in /tmp/ci-doctor/filtered/artifact-<name>-hints.txt
The summary at /tmp/ci-doctor/summary.txt enumerates all log files, all artifact files, and all hint files with their first few matches

Directory layout:

/tmp/ci-doctor/logs/ — job log files + failed-jobs.json
/tmp/ci-doctor/artifacts/ — unpacked artifact files
/tmp/ci-doctor/filtered/ — hint files for both logs and artifacts
/tmp/ci-doctor/summary.txt — starting point for the agent

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

https://api.github.com/graphql
- Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw /home/REDACTED/work/gh-aw/gh-aw/.github/workflows/ci-doctor.md /home/REDACTED/work/gh-aw/gh-aw/.github/workflows/claude-code-user-docs-review.md /home/REDACTED/work/gh-aw/gh-aw/.github/workflows/cli-consistency-checker.md (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Copilot

Pull request overview

This PR optimizes the CI Failure Doctor workflow to reduce token consumption from ~1.5M tokens per run by pre-downloading logs and artifacts before the agent session starts, then applying generic error-detection heuristics to create focused hint files that the agent can examine first.

Changes:

Added a pre-download step that fetches failed job logs and artifacts, applies grep heuristics to identify error lines, and creates a summary file for the agent to start with
Updated agent instructions to prioritize pre-downloaded hint files over full log retrieval
Reduced fallback tail_lines parameter from 200 to 100 for MCP log retrieval

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`.github/workflows/ci-doctor.md`	Added pre-download step with generic error heuristics, new Pre-Analysis Data section, and updated investigation protocol to use pre-downloaded files first
`.github/workflows/ci-doctor.lock.yml`	Auto-generated lock file with updated metadata hash and compiled pre-download step

Comments suppressed due to low confidence (1)

.github/workflows/ci-doctor.md:118

The grep pattern exit status [^0] will match "exit status 1" but also "exit status " followed by any non-zero character (including letters, symbols, etc.), not just non-zero exit codes. The pattern should be exit status [1-9] to match only non-zero numeric exit codes, or exit (code|status) [1-9] to cover both common patterns.

        grep -n -iE "(error[: ]|ERROR|FAIL|panic:|fatal[: ]|undefined[: ]|exception|exit status [^0])" \

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-22T13:58:30Z

.github/workflows/ci-doctor.md

+        find "$ARTIFACT_DIR" -type f | while read -r f; do
+          echo "  $f"
+        done


The find command on line 142 in the summary section could fail or produce unexpected output if $ARTIFACT_DIR is empty or doesn't exist. While the directory is created earlier, if artifact download fails, the find might still print an error to stderr. Consider adding a check: [ -d "$ARTIFACT_DIR" ] && find "$ARTIFACT_DIR" -type f || echo " (no artifacts downloaded)" to handle this edge case gracefully.

Suggested change

find "$ARTIFACT_DIR" -type f | while read -r f; do

echo " $f"

done

if [ -d "$ARTIFACT_DIR" ]; then

find "$ARTIFACT_DIR" -type f | while read -r f; do

echo " $f"

done

else

echo " (no artifacts downloaded)"

fi

Copilot · 2026-02-22T13:58:31Z

.github/workflows/ci-doctor.md

+
+        # Apply generic heuristics: find lines with common error indicators
+        HINTS_FILE="$FILTERED_DIR/job-${JOB_ID}-hints.txt"
+        grep -n -iE "(error[: ]|ERROR|FAIL|panic:|fatal[: ]|undefined[: ]|exception|exit status [^0])" \


Copilot · 2026-02-22T13:58:31Z

.github/workflows/ci-doctor.md

+        -name "*.txt" -o -name "*.log" -o -name "*.json" \
+        -o -name "*.xml" -o -name "*.out" -o -name "*.err" \
+      \) | while read -r ARTIFACT_FILE; do
+        REL_PATH="${ARTIFACT_FILE#$ARTIFACT_DIR/}"


The variable substitution ${ARTIFACT_FILE#$ARTIFACT_DIR/} should quote the variable $ARTIFACT_DIR to prevent word splitting in case the path contains spaces. Change to ${ARTIFACT_FILE#"$ARTIFACT_DIR"/} for safer string manipulation.

Suggested change

REL_PATH="${ARTIFACT_FILE#$ARTIFACT_DIR/}"

REL_PATH="${ARTIFACT_FILE#"$ARTIFACT_DIR"/}"

Initial plan

ac05a2b

Copilot AI assigned Copilot and pelikhan Feb 22, 2026

Copilot started work on behalf of pelikhan February 22, 2026 13:32 View session

fix(ci-doctor): add generic log pre-download and error heuristics to …

8383a7d

…reduce token usage Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix CI Doctor to improve token usage with error pre-location heuristics~~ fix(ci-doctor): pre-download logs and apply generic error heuristics to reduce token usage Feb 22, 2026

Copilot finished work on behalf of pelikhan February 22, 2026 13:46

pelikhan reviewed Feb 22, 2026

View reviewed changes

Copilot started work on behalf of pelikhan February 22, 2026 13:48 View session

fix(ci-doctor): also download and unpack artifacts, apply heuristics …

046df74

…to artifact files Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~fix(ci-doctor): pre-download logs and apply generic error heuristics to reduce token usage~~ fix(ci-doctor): pre-download logs and artifacts, apply generic error heuristics to reduce token usage Feb 22, 2026

Copilot finished work on behalf of pelikhan February 22, 2026 13:53

pelikhan approved these changes Feb 22, 2026

View reviewed changes

pelikhan marked this pull request as ready for review February 22, 2026 13:56

Copilot AI review requested due to automatic review settings February 22, 2026 13:56

pelikhan merged commit 6221cdc into main Feb 22, 2026
1 check passed

pelikhan deleted the copilot/optimize-ci-doctor-error-handling branch February 22, 2026 13:56

Copilot started reviewing on behalf of pelikhan February 22, 2026 13:56 View session

Copilot AI reviewed Feb 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci-doctor): pre-download logs and artifacts, apply generic error heuristics to reduce token usage#17719

fix(ci-doctor): pre-download logs and artifacts, apply generic error heuristics to reduce token usage#17719
pelikhan merged 3 commits intomainfrom
copilot/optimize-ci-doctor-error-handling

Copilot AI commented Feb 22, 2026 •

edited

Loading

Uh oh!

pelikhan left a comment

Uh oh!

Copilot AI commented Feb 22, 2026 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 22, 2026

Uh oh!

Copilot AI Feb 22, 2026

Uh oh!

Copilot AI Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	grep -n -iE "(error[: ]\|ERROR\|FAIL\|panic:\|fatal[: ]\|undefined[: ]\|exception\|exit status [^0])" \
	grep -n -iE "(error[: ]\|ERROR\|FAIL\|panic:\|fatal[: ]\|undefined[: ]\|exception\|exit status [1-9])" \

	REL_PATH="${ARTIFACT_FILE#$ARTIFACT_DIR/}"
	REL_PATH="${ARTIFACT_FILE#"$ARTIFACT_DIR"/}"

Conversation

Copilot AI commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Issues Found (from live data)

Changes Made

Expected Improvements

Validation

References

Uh oh!

pelikhan left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 22, 2026 •

edited

Loading

Copilot AI commented Feb 22, 2026 •

edited

Loading