Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions .claude/commands/summarize_ci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
Analyze CI failures for PR {{arg}} in galaxyproject/galaxy repository.

Steps:
1. **Backup existing review directory if it exists**:
- Check if `database/pr_reviews/{{arg}}/` exists
- If yes, move to `database/pr_reviews/{{arg}}_backup_$(date +%Y%m%d_%H%M%S)`
- Notify user: "Backed up previous review to {{arg}}_backup_YYYYMMDD_HHMMSS"
2. **Create fresh review directory**: `mkdir -p database/pr_reviews/{{arg}}`
3. Get PR head commit SHA: `gh pr view {{arg}} --repo galaxyproject/galaxy --json headRefOid --jq .headRefOid`
4. Find failed workflow runs: `gh api repos/galaxyproject/galaxy/commits/<SHA>/check-runs --jq '.check_runs[] | select(.conclusion == "failure") | .html_url' | grep -oE 'runs/[0-9]+' | cut -d'/' -f2 | sort -u`
- **If no failed runs found:** Check if tests are still in progress
- If in progress: Report "Tests still running - wait for completion"
- If all passed: Report "No failures - all tests passed!" and exit
5. For each failed run, categorize by artifact availability:
- List artifacts: `gh api repos/galaxyproject/galaxy/actions/runs/<RUN_ID>/artifacts --jq '.artifacts[] | {name: .name, id: .id, size_in_bytes: .size_in_bytes}'`
- **If run has test artifacts (HTML/JSON):** Mark for download (test failures)
- **If run has no artifacts:** Mark for log extraction (likely linting, build, or startup failures)
6. **Download all test artifacts to review directory**:
- Prefer JSON artifacts (e.g., "Playwright test results JSON", "Integration test results JSON")
- Download to `database/pr_reviews/{{arg}}/`
- Command: `gh run download <RUN_ID> --dir database/pr_reviews/{{arg}}/ --repo galaxyproject/galaxy`
- This preserves artifact directory structure (e.g., "Playwright test results JSON/run_playwright_tests.json")
- Multiple test types/shards will have different artifact names, avoiding collisions
- **CRITICAL: Check exit code after each download**
- **If download fails:**
- Show error message with run ID
- Ask user: "Download failed for run <RUN_ID>. This may be due to network timeout or expired artifacts. Retry? (y/n)"
- If yes, retry with longer timeout (300s)
- If no or second failure, STOP and report incomplete analysis
- DO NOT proceed with partial data
7. **Extract logs from runs without artifacts:**
- For each run marked for log extraction:
- Get failed job IDs: `gh api repos/galaxyproject/galaxy/actions/runs/<RUN_ID>/jobs --jq '.jobs[] | select(.conclusion == "failure") | {id: .id, name: .name}'`
- For each failed job, extract relevant error info:
- Get job logs: `gh api repos/galaxyproject/galaxy/actions/jobs/<JOB_ID>/logs`
- Parse for common failure patterns:
- Python linting: Look for "isort", "flake8", "black", "ruff" errors
- TypeScript: Look for "tsc", "eslint", "prettier" errors
- Build failures: Look for "error:", "failed", compilation errors
- Extract last 20-50 lines of relevant errors
- Save to `database/pr_reviews/{{arg}}/<RUN_ID>_<JOB_NAME>.log`
- Include job name and extracted errors in summary

8. **Validate downloads succeeded:**
- Check if `database/pr_reviews/{{arg}}/` has artifact directories OR log files
- If completely empty: STOP and report "No artifacts or logs extracted - analysis failed"
- Count expected vs actual artifact directories
- If mismatch: WARN user about missing artifacts

9. Parse test results from all downloaded artifacts:
- Find all JSON files: `find database/pr_reviews/{{arg}}/ -name "*.json" -type f`
- For each JSON file:
```python
data = json.load(open(json_file))
failures = [
{'test': test_id, 'duration': run['duration'], 'log': run.get('log', ''), 'artifact': artifact_name, 'result': run['result']}
for test_id, runs in data['tests'].items()
for run in runs if run['result'] in ['Failed', 'Error']
]
```
- Fall back to HTML if no JSON found:
- Find HTML files in artifact directories
- Extract embedded JSON from `data-jsonblob="..."`
- Parse and extract failures (both 'Failed' and 'Error' results)
- **If no JSON or HTML found:** STOP and report "No test result files found in artifacts"
- **Note:** pytest distinguishes 'Failed' (assertion failed) from 'Error' (exception during setup/execution) - both are test failures

10. **Categorize failures** by checking error messages:
- **Transient**: Look for `TRANSIENT FAILURE [Issue #` in error log/message
- Extract issue number from pattern
- **New**: All other failures

11. Generate markdown summary with:
- Run IDs
- **For runs with artifacts:**
- Artifact names and sizes (indicate JSON vs HTML)
- **Known transient failures** (✅):
- Test name
- Artifact/test type
- Issue number (with link)
- Duration
- **New test failures requiring investigation** (❌):
- Test name
- Artifact/test type
- Result type (Failed vs Error)
- Duration
- Error preview
- **For runs without artifacts (linting/build):**
- Job name (e.g., "Python linting", "client / build-client")
- Failure type (isort, eslint, build error, etc.)
- Error count or preview of first few errors
- Indicate these are NOT test failures
- Total counts (separate test failures from linting/build failures)

12. **Write summary to file** `database/pr_reviews/{{arg}}/summary`:
- Write the complete markdown summary
- This file is used by `/summarize_ci_post` to post to PR
- Format: Same markdown as displayed to user

**Example output:**
```
Analyzing PR #21218...
Backed up previous review to 21218_backup_20251031_143022
Found 3 failed workflow run(s)

Run 18975780470 (test artifacts):
- Playwright test results JSON (0.1 MB) ⚡
- Playwright test results JSON (shard 2) (0.1 MB) ⚡

Run 18975780416 (test artifacts):
- Integration test results JSON (0.5 MB) ⚡

Run 18975780500 (no artifacts - extracted logs):
- Python linting

================================================================================
FAILURE SUMMARY
================================================================================

🔧 **Linting/Build failures (1):**
• Python linting
Type: isort import ordering
Files affected: 3
Example: lib/galaxy/managers/users.py - imports not sorted

✅ **Known transient test failures (2):**
• test_history_sharing.py::test_sharing_private_history
From: Playwright test results JSON
Issue: https://github.com/galaxyproject/galaxy/issues/12345
Duration: 00:01:30
• test_tool_discovery.py::test_tool_discovery_landing
From: Integration test results JSON
Issue: https://github.com/galaxyproject/galaxy/issues/67890
Duration: 00:00:54

❌ **New test failures requiring investigation (1):**
• test_workflow.py::test_save_workflow
From: Playwright test results JSON (shard 2)
Type: Failed
Duration: 00:01:15
Error: AssertionError: Expected element to be visible

**Total:** 1 linting/build failure, 2 transient tests, 1 new test failure

Summary and artifacts saved to database/pr_reviews/21218/
```

13. **Display and save:**
- Print summary to user
- Write same content to `database/pr_reviews/{{arg}}/summary`
- Create/update symlink: `ln -sfn {{arg}} database/pr_reviews/latest`
- Notify user: "Summary and artifacts saved to database/pr_reviews/{{arg}}/"

Output concise summary showing categorized failures. Transient failures indicate "safe to re-run", new failures indicate "requires investigation".

**Notes:**
- The summary and downloaded artifacts are saved to `database/pr_reviews/{{arg}}/` for use by `/summarize_ci_post`
- Linting/build failures are extracted from job logs since these jobs don't produce test artifacts
- Common patterns: isort, black, flake8, ruff, eslint, prettier, tsc, build errors
- Log extraction focuses on last 20-50 lines and specific error markers to keep output concise

**Marking tests as transient failures:**
To mark a test as a known transient failure, manually add the `@transient_failure(issue=N)` decorator:

```python
from galaxy.util.unittest_utils import transient_failure

@transient_failure(issue=12345) # GitHub issue number tracking this failure
def test_flaky_feature(self):
# Test that sometimes fails
...
```

Once decorated, future failures will be automatically categorized as transient.
123 changes: 123 additions & 0 deletions .claude/commands/summarize_ci_post.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
Post CI failure summary to PR as a comment.

**Usage:** `/summarize_ci_post [PR#] [additional message]`

**Arguments:**
- `PR#` (optional): Pull request number to comment on. If omitted, uses latest analyzed PR (via `database/pr_reviews/latest` symlink)
- `additional message` (optional): Extra text to add before the summary

**Steps:**

1. **Determine PR number:**
- If `{{arg}}` provided, use that as PR#
- Otherwise, check `database/pr_reviews/latest` symlink:
- If exists: `readlink database/pr_reviews/latest` to get PR#
- If not exists: show error "No PR# provided and no latest review found. Run /summarize_ci first."

2. **Check for summary file:**
- Look for `database/pr_reviews/<PR#>/summary`
- If not found, show error:
```
Error: database/pr_reviews/<PR#>/summary not found
Run /summarize_ci <PR#> first to generate summary
```

3. **Read summary:**
- Read contents of `database/pr_reviews/<PR#>/summary`
- This contains the markdown-formatted failure summary

4. **Build comment body:**
- If additional message provided, add it at the top
- Add summary from file
- Add footer with instructions

4. **Post comment to PR:**
```bash
gh pr comment <PR#> --repo galaxyproject/galaxy --body "$(cat <<'EOF'
[additional message if provided]

## CI Failure Summary

[contents of database/pr_reviews/<PR#>/summary]

---
*Summary generated with [Claude Code](https://claude.com/claude-code)*
EOF
)"
```

5. **Output:**
```
✅ Posted summary to PR #21218
https://github.com/galaxyproject/galaxy/pull/21218#issuecomment-xxxxx
```

**Examples:**

**Simple post with explicit PR#:**
```bash
/summarize_ci_post 21218
```

**Post using latest analyzed PR:**
```bash
/summarize_ci_post
```

**With additional message:**
```bash
/summarize_ci_post 21218 "These failures look like known transient issues. Re-running checks."
```

**Latest PR with message:**
```bash
/summarize_ci_post "Only transient failures - safe to re-run"
```

**Output:**
```
Reading summary from database/pr_reviews/21218/summary...
✅ Posted summary to PR #21218
https://github.com/galaxyproject/galaxy/pull/21218#issuecomment-1234567890
```

**Example comment posted:**
```markdown
These failures look like known transient issues. Re-running checks.

## CI Failure Summary

Found 2 failed workflow run(s)

Run 18975780470:
- Playwright test results JSON (0.1 MB) ⚡

✅ **Known transient failures (2):**
• test_history_sharing.py::test_sharing_private_history - Issue #12345
From: Playwright test results JSON
Duration: 00:01:30
• test_tool_discovery.py::test_tool_discovery_landing - Issue #67890
From: Integration test results JSON
Duration: 00:00:54

❌ **New failures requiring investigation (0)**

Total: 2 transient, 0 new

---
*Summary generated with [Claude Code](https://claude.com/claude-code)*
```

**Common workflow:**
```bash
# Analyze PR
/summarize_ci 21218

# Review output, then post to PR
/summarize_ci_post 21218 "Only transient failures - safe to merge after re-run"
```

**Error Handling:**
- If `database/pr_reviews/<PR#>/summary` doesn't exist, prompt to run `/summarize_ci` first
- If PR# is invalid, show error
- If gh command fails (permissions, network), show error message
7 changes: 6 additions & 1 deletion .github/workflows/api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ jobs:
path: 'galaxy root/.venv'
key: gxy-venv-${{ runner.os }}-${{ steps.full-python-version.outputs.version }}-${{ hashFiles('galaxy root/requirements.txt') }}-api
- name: Run tests
run: ./run_tests.sh --coverage --skip_flakey_fails -api lib/galaxy_test/api -- --num-shards=2 --shard-id=${{ matrix.chunk }}
run: ./run_tests.sh --coverage --skip_flakey_fails -api lib/galaxy_test/api -- --num-shards=2 --shard-id=${{ matrix.chunk }} --json-report --json-report-file=run_api_tests.json
working-directory: 'galaxy root'
- uses: codecov/codecov-action@v5
with:
Expand All @@ -77,3 +77,8 @@ jobs:
with:
name: API test results (${{ matrix.python-version }}, ${{ matrix.chunk }})
path: 'galaxy root/run_api_tests.html'
- uses: actions/upload-artifact@v5
if: failure()
with:
name: API test results JSON (${{ matrix.python-version }}, ${{ matrix.chunk }})
path: 'galaxy root/run_api_tests.json'
7 changes: 6 additions & 1 deletion .github/workflows/cwl_conformance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ jobs:
path: 'galaxy root/.venv'
key: gxy-venv-${{ runner.os }}-${{ steps.full-python-version.outputs.version }}-${{ hashFiles('galaxy root/requirements.txt') }}-api
- name: Run tests
run: ./run_tests.sh --coverage --skip_flakey_fails -cwl lib/galaxy_test/api/cwl -- -m "${{ matrix.marker }} and ${{ matrix.conformance-version }}"
run: ./run_tests.sh --coverage --skip_flakey_fails -cwl lib/galaxy_test/api/cwl -- -m "${{ matrix.marker }} and ${{ matrix.conformance-version }}" --json-report --json-report-file=run_cwl_tests.json
working-directory: 'galaxy root'
- uses: codecov/codecov-action@v5
with:
Expand All @@ -69,3 +69,8 @@ jobs:
with:
name: CWL conformance test results (${{ matrix.python-version }}, ${{ matrix.marker }}, ${{ matrix.conformance-version }})
path: 'galaxy root/run_cwl_tests.html'
- uses: actions/upload-artifact@v5
if: failure()
with:
name: CWL conformance test results JSON (${{ matrix.python-version }}, ${{ matrix.marker }}, ${{ matrix.conformance-version }})
path: 'galaxy root/run_cwl_tests.json'
7 changes: 6 additions & 1 deletion .github/workflows/framework_tools.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
path: 'galaxy root/.venv'
key: gxy-venv-${{ runner.os }}-${{ steps.full-python-version.outputs.version }}-${{ hashFiles('galaxy root/requirements.txt') }}-framework
- name: Run tests
run: GALAXY_TEST_USE_LEGACY_TOOL_API="${{ matrix.use-legacy-api }}" ./run_tests.sh --coverage --framework-tools
run: GALAXY_TEST_USE_LEGACY_TOOL_API="${{ matrix.use-legacy-api }}" ./run_tests.sh --coverage --framework-tools -- --json-report --json-report-file=run_framework_tests.json
working-directory: 'galaxy root'
- uses: codecov/codecov-action@v5
with:
Expand All @@ -73,3 +73,8 @@ jobs:
with:
name: Tool framework test results (${{ matrix.python-version }})
path: 'galaxy root/run_framework_tests.html'
- uses: actions/upload-artifact@v5
if: failure()
with:
name: Tool framework test results JSON (${{ matrix.python-version }})
path: 'galaxy root/run_framework_tests.json'
7 changes: 6 additions & 1 deletion .github/workflows/framework_workflows.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
path: 'galaxy root/.venv'
key: gxy-venv-${{ runner.os }}-${{ steps.full-python-version.outputs.version }}-${{ hashFiles('galaxy root/requirements.txt') }}-framework
- name: Run tests
run: ./run_tests.sh --coverage --framework-workflows
run: ./run_tests.sh --coverage --framework-workflows -- --json-report --json-report-file=run_framework_workflows_tests.json
working-directory: 'galaxy root'
- uses: codecov/codecov-action@v5
with:
Expand All @@ -73,3 +73,8 @@ jobs:
with:
name: Workflow framework test results (${{ matrix.python-version }})
path: 'galaxy root/run_framework_workflows_tests.html'
- uses: actions/upload-artifact@v5
if: failure()
with:
name: Workflow framework test results JSON (${{ matrix.python-version }})
path: 'galaxy root/run_framework_workflows_tests.json'
7 changes: 6 additions & 1 deletion .github/workflows/integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ jobs:
- name: Run tests
run: |
. .ci/minikube-test-setup/start_services.sh
./run_tests.sh --coverage -integration test/integration -- --num-shards=4 --shard-id=${{ matrix.chunk }}
./run_tests.sh --coverage -integration test/integration -- --num-shards=4 --shard-id=${{ matrix.chunk }} --json-report --json-report-file=run_integration_tests.json
working-directory: 'galaxy root'
- uses: codecov/codecov-action@v5
with:
Expand All @@ -85,3 +85,8 @@ jobs:
with:
name: Integration test results (${{ matrix.python-version }}, ${{ matrix.chunk }})
path: 'galaxy root/run_integration_tests.html'
- uses: actions/upload-artifact@v5
if: failure()
with:
name: Integration test results JSON (${{ matrix.python-version }}, ${{ matrix.chunk }})
path: 'galaxy root/run_integration_tests.json'
Loading
Loading