galaxyproject · jmchilton · Oct 31, 2025 · Oct 31, 2025 · Oct 31, 2025 · Nov 3, 2025
diff --git a/.claude/commands/summarize_ci.md b/.claude/commands/summarize_ci.md
@@ -0,0 +1,174 @@
+Analyze CI failures for PR {{arg}} in galaxyproject/galaxy repository.
+
+Steps:
+1. **Backup existing review directory if it exists**:
+   - Check if `database/pr_reviews/{{arg}}/` exists
+   - If yes, move to `database/pr_reviews/{{arg}}_backup_$(date +%Y%m%d_%H%M%S)`
+   - Notify user: "Backed up previous review to {{arg}}_backup_YYYYMMDD_HHMMSS"
+2. **Create fresh review directory**: `mkdir -p database/pr_reviews/{{arg}}`
+3. Get PR head commit SHA: `gh pr view {{arg}} --repo galaxyproject/galaxy --json headRefOid --jq .headRefOid`
+4. Find failed workflow runs: `gh api repos/galaxyproject/galaxy/commits/<SHA>/check-runs --jq '.check_runs[] | select(.conclusion == "failure") | .html_url' | grep -oE 'runs/[0-9]+' | cut -d'/' -f2 | sort -u`
+   - **If no failed runs found:** Check if tests are still in progress
+   - If in progress: Report "Tests still running - wait for completion"
+   - If all passed: Report "No failures - all tests passed!" and exit
+5. For each failed run, categorize by artifact availability:
+   - List artifacts: `gh api repos/galaxyproject/galaxy/actions/runs/<RUN_ID>/artifacts --jq '.artifacts[] | {name: .name, id: .id, size_in_bytes: .size_in_bytes}'`
+   - **If run has test artifacts (HTML/JSON):** Mark for download (test failures)
+   - **If run has no artifacts:** Mark for log extraction (likely linting, build, or startup failures)
+6. **Download all test artifacts to review directory**:
+   - Prefer JSON artifacts (e.g., "Playwright test results JSON", "Integration test results JSON")
+   - Download to `database/pr_reviews/{{arg}}/`
+   - Command: `gh run download <RUN_ID> --dir database/pr_reviews/{{arg}}/ --repo galaxyproject/galaxy`
+   - This preserves artifact directory structure (e.g., "Playwright test results JSON/run_playwright_tests.json")
+   - Multiple test types/shards will have different artifact names, avoiding collisions
+   - **CRITICAL: Check exit code after each download**
+   - **If download fails:**
+     - Show error message with run ID
+     - Ask user: "Download failed for run <RUN_ID>. This may be due to network timeout or expired artifacts. Retry? (y/n)"
+     - If yes, retry with longer timeout (300s)
+     - If no or second failure, STOP and report incomplete analysis
+     - DO NOT proceed with partial data
+7. **Extract logs from runs without artifacts:**
+   - For each run marked for log extraction:
+   - Get failed job IDs: `gh api repos/galaxyproject/galaxy/actions/runs/<RUN_ID>/jobs --jq '.jobs[] | select(.conclusion == "failure") | {id: .id, name: .name}'`
+   - For each failed job, extract relevant error info:
+     - Get job logs: `gh api repos/galaxyproject/galaxy/actions/jobs/<JOB_ID>/logs`
+     - Parse for common failure patterns:
+       - Python linting: Look for "isort", "flake8", "black", "ruff" errors
+       - TypeScript: Look for "tsc", "eslint", "prettier" errors
+       - Build failures: Look for "error:", "failed", compilation errors
+     - Extract last 20-50 lines of relevant errors
+     - Save to `database/pr_reviews/{{arg}}/<RUN_ID>_<JOB_NAME>.log`
+   - Include job name and extracted errors in summary
+
+8. **Validate downloads succeeded:**
+   - Check if `database/pr_reviews/{{arg}}/` has artifact directories OR log files
+   - If completely empty: STOP and report "No artifacts or logs extracted - analysis failed"
+   - Count expected vs actual artifact directories
+   - If mismatch: WARN user about missing artifacts
+
+9. Parse test results from all downloaded artifacts:
+   - Find all JSON files: `find database/pr_reviews/{{arg}}/ -name "*.json" -type f`
+   - For each JSON file:
+     ```python
+     data = json.load(open(json_file))
+     failures = [
+         {'test': test_id, 'duration': run['duration'], 'log': run.get('log', ''), 'artifact': artifact_name, 'result': run['result']}
+         for test_id, runs in data['tests'].items()
+         for run in runs if run['result'] in ['Failed', 'Error']
+     ]
+     ```
+   - Fall back to HTML if no JSON found:
+     - Find HTML files in artifact directories
+     - Extract embedded JSON from `data-jsonblob="..."`
+     - Parse and extract failures (both 'Failed' and 'Error' results)
+   - **If no JSON or HTML found:** STOP and report "No test result files found in artifacts"
+   - **Note:** pytest distinguishes 'Failed' (assertion failed) from 'Error' (exception during setup/execution) - both are test failures
+
+10. **Categorize failures** by checking error messages:
+   - **Transient**: Look for `TRANSIENT FAILURE [Issue #` in error log/message
+   - Extract issue number from pattern
+   - **New**: All other failures
+
+11. Generate markdown summary with:
+   - Run IDs
+   - **For runs with artifacts:**
+     - Artifact names and sizes (indicate JSON vs HTML)
+     - **Known transient failures** (✅):
+       - Test name
+       - Artifact/test type
+       - Issue number (with link)
+       - Duration
+     - **New test failures requiring investigation** (❌):
+       - Test name
+       - Artifact/test type
+       - Result type (Failed vs Error)
+       - Duration
+       - Error preview
+   - **For runs without artifacts (linting/build):**
+     - Job name (e.g., "Python linting", "client / build-client")
+     - Failure type (isort, eslint, build error, etc.)
+     - Error count or preview of first few errors
+     - Indicate these are NOT test failures
+   - Total counts (separate test failures from linting/build failures)
+
+12. **Write summary to file** `database/pr_reviews/{{arg}}/summary`:
+   - Write the complete markdown summary
+   - This file is used by `/summarize_ci_post` to post to PR
+   - Format: Same markdown as displayed to user
+
+**Example output:**
+```
+Analyzing PR #21218...
+Backed up previous review to 21218_backup_20251031_143022
+Found 3 failed workflow run(s)
+
+Run 18975780470 (test artifacts):
+  - Playwright test results JSON (0.1 MB) ⚡
+  - Playwright test results JSON (shard 2) (0.1 MB) ⚡
+
+Run 18975780416 (test artifacts):
+  - Integration test results JSON (0.5 MB) ⚡
+
+Run 18975780500 (no artifacts - extracted logs):
+  - Python linting
+
+================================================================================
+FAILURE SUMMARY
+================================================================================
+
+🔧 **Linting/Build failures (1):**
+  • Python linting
+    Type: isort import ordering
+    Files affected: 3
+    Example: lib/galaxy/managers/users.py - imports not sorted
+
+✅ **Known transient test failures (2):**
+  • test_history_sharing.py::test_sharing_private_history
+    From: Playwright test results JSON
+    Issue: https://github.com/galaxyproject/galaxy/issues/12345
+    Duration: 00:01:30
+  • test_tool_discovery.py::test_tool_discovery_landing
+    From: Integration test results JSON
+    Issue: https://github.com/galaxyproject/galaxy/issues/67890
+    Duration: 00:00:54
+
+❌ **New test failures requiring investigation (1):**
+  • test_workflow.py::test_save_workflow
+    From: Playwright test results JSON (shard 2)
+    Type: Failed
+    Duration: 00:01:15
+    Error: AssertionError: Expected element to be visible
+
+**Total:** 1 linting/build failure, 2 transient tests, 1 new test failure
+
+Summary and artifacts saved to database/pr_reviews/21218/
+```
+
+13. **Display and save:**
+    - Print summary to user
+    - Write same content to `database/pr_reviews/{{arg}}/summary`
+    - Create/update symlink: `ln -sfn {{arg}} database/pr_reviews/latest`
+    - Notify user: "Summary and artifacts saved to database/pr_reviews/{{arg}}/"
+
+Output concise summary showing categorized failures. Transient failures indicate "safe to re-run", new failures indicate "requires investigation".
+
+**Notes:**
+- The summary and downloaded artifacts are saved to `database/pr_reviews/{{arg}}/` for use by `/summarize_ci_post`
+- Linting/build failures are extracted from job logs since these jobs don't produce test artifacts
+- Common patterns: isort, black, flake8, ruff, eslint, prettier, tsc, build errors
+- Log extraction focuses on last 20-50 lines and specific error markers to keep output concise
+
+**Marking tests as transient failures:**
+To mark a test as a known transient failure, manually add the `@transient_failure(issue=N)` decorator:
+
+```python
+from galaxy.util.unittest_utils import transient_failure
+
+@transient_failure(issue=12345)  # GitHub issue number tracking this failure
+def test_flaky_feature(self):
+    # Test that sometimes fails
+    ...
+```
+
+Once decorated, future failures will be automatically categorized as transient.
diff --git a/.claude/commands/summarize_ci_post.md b/.claude/commands/summarize_ci_post.md
@@ -0,0 +1,123 @@
+Post CI failure summary to PR as a comment.
+
+**Usage:** `/summarize_ci_post [PR#] [additional message]`
+
+**Arguments:**
+- `PR#` (optional): Pull request number to comment on. If omitted, uses latest analyzed PR (via `database/pr_reviews/latest` symlink)
+- `additional message` (optional): Extra text to add before the summary
+
+**Steps:**
+
+1. **Determine PR number:**
+   - If `{{arg}}` provided, use that as PR#
+   - Otherwise, check `database/pr_reviews/latest` symlink:
+     - If exists: `readlink database/pr_reviews/latest` to get PR#
+     - If not exists: show error "No PR# provided and no latest review found. Run /summarize_ci first."
+
+2. **Check for summary file:**
+   - Look for `database/pr_reviews/<PR#>/summary`
+   - If not found, show error:
+     ```
+     Error: database/pr_reviews/<PR#>/summary not found
+     Run /summarize_ci <PR#> first to generate summary
+     ```
+
+3. **Read summary:**
+   - Read contents of `database/pr_reviews/<PR#>/summary`
+   - This contains the markdown-formatted failure summary
+
+4. **Build comment body:**
+   - If additional message provided, add it at the top
+   - Add summary from file
+   - Add footer with instructions
+
+4. **Post comment to PR:**
+   ```bash
+   gh pr comment <PR#> --repo galaxyproject/galaxy --body "$(cat <<'EOF'
+   [additional message if provided]
+
+   ## CI Failure Summary
+
+   [contents of database/pr_reviews/<PR#>/summary]
+
+   ---
+   *Summary generated with [Claude Code](https://claude.com/claude-code)*
+   EOF
+   )"
+   ```
+
+5. **Output:**
+   ```
+   ✅ Posted summary to PR #21218
+   https://github.com/galaxyproject/galaxy/pull/21218#issuecomment-xxxxx
+   ```
+
+**Examples:**
+
+**Simple post with explicit PR#:**
+```bash
+/summarize_ci_post 21218
+```
+
+**Post using latest analyzed PR:**
+```bash
+/summarize_ci_post
+```
+
+**With additional message:**
+```bash
+/summarize_ci_post 21218 "These failures look like known transient issues. Re-running checks."
+```
+
+**Latest PR with message:**
+```bash
+/summarize_ci_post "Only transient failures - safe to re-run"
+```
+
+**Output:**
+```
+Reading summary from database/pr_reviews/21218/summary...
+✅ Posted summary to PR #21218
+https://github.com/galaxyproject/galaxy/pull/21218#issuecomment-1234567890
+```
+
+**Example comment posted:**
+```markdown
+These failures look like known transient issues. Re-running checks.
+
+## CI Failure Summary
+
+Found 2 failed workflow run(s)
+
+Run 18975780470:
+  - Playwright test results JSON (0.1 MB) ⚡
+
+✅ **Known transient failures (2):**
+  • test_history_sharing.py::test_sharing_private_history - Issue #12345
+    From: Playwright test results JSON
+    Duration: 00:01:30
+  • test_tool_discovery.py::test_tool_discovery_landing - Issue #67890
+    From: Integration test results JSON
+    Duration: 00:00:54
+
+❌ **New failures requiring investigation (0)**
+
+Total: 2 transient, 0 new
+
+---
+*Summary generated with [Claude Code](https://claude.com/claude-code)*
+```
+
+**Common workflow:**
+```bash
+# Analyze PR
+/summarize_ci 21218
+
+# Review output, then post to PR
+/summarize_ci_post 21218 "Only transient failures - safe to merge after re-run"
+```
+
+**Error Handling:**
+- If `database/pr_reviews/<PR#>/summary` doesn't exist, prompt to run `/summarize_ci` first
+- If PR# is invalid, show error
+- If gh command fails (permissions, network), show error message
diff --git a/.github/workflows/api.yaml b/.github/workflows/api.yaml
@@ -66,7 +66,7 @@ jobs:
           path: 'galaxy root/.venv'
           key: gxy-venv-${{ runner.os }}-${{ steps.full-python-version.outputs.version }}-${{ hashFiles('galaxy root/requirements.txt') }}-api
       - name: Run tests
-        run: ./run_tests.sh --coverage --skip_flakey_fails -api lib/galaxy_test/api -- --num-shards=2 --shard-id=${{ matrix.chunk }}
+        run: ./run_tests.sh --coverage --skip_flakey_fails -api lib/galaxy_test/api -- --num-shards=2 --shard-id=${{ matrix.chunk }} --json-report --json-report-file=run_api_tests.json
         working-directory: 'galaxy root'
       - uses: codecov/codecov-action@v5
         with:
@@ -77,3 +77,8 @@ jobs:
         with:
           name: API test results (${{ matrix.python-version }}, ${{ matrix.chunk }})
           path: 'galaxy root/run_api_tests.html'
+      - uses: actions/upload-artifact@v5
+        if: failure()
+        with:
+          name: API test results JSON (${{ matrix.python-version }}, ${{ matrix.chunk }})
+          path: 'galaxy root/run_api_tests.json'
diff --git a/.github/workflows/cwl_conformance.yaml b/.github/workflows/cwl_conformance.yaml
@@ -58,7 +58,7 @@ jobs:
           path: 'galaxy root/.venv'
           key: gxy-venv-${{ runner.os }}-${{ steps.full-python-version.outputs.version }}-${{ hashFiles('galaxy root/requirements.txt') }}-api
       - name: Run tests
-        run: ./run_tests.sh --coverage --skip_flakey_fails -cwl lib/galaxy_test/api/cwl -- -m "${{ matrix.marker }} and ${{ matrix.conformance-version }}"
+        run: ./run_tests.sh --coverage --skip_flakey_fails -cwl lib/galaxy_test/api/cwl -- -m "${{ matrix.marker }} and ${{ matrix.conformance-version }}" --json-report --json-report-file=run_cwl_tests.json
         working-directory: 'galaxy root'
       - uses: codecov/codecov-action@v5
         with:
@@ -69,3 +69,8 @@ jobs:
         with:
           name: CWL conformance test results (${{ matrix.python-version }}, ${{ matrix.marker }}, ${{ matrix.conformance-version }})
           path: 'galaxy root/run_cwl_tests.html'
+      - uses: actions/upload-artifact@v5
+        if: failure()
+        with:
+          name: CWL conformance test results JSON (${{ matrix.python-version }}, ${{ matrix.marker }}, ${{ matrix.conformance-version }})
+          path: 'galaxy root/run_cwl_tests.json'
diff --git a/.github/workflows/framework_tools.yaml b/.github/workflows/framework_tools.yaml
@@ -62,7 +62,7 @@ jobs:
           path: 'galaxy root/.venv'
           key: gxy-venv-${{ runner.os }}-${{ steps.full-python-version.outputs.version }}-${{ hashFiles('galaxy root/requirements.txt') }}-framework
       - name: Run tests
-        run: GALAXY_TEST_USE_LEGACY_TOOL_API="${{ matrix.use-legacy-api }}" ./run_tests.sh --coverage --framework-tools
+        run: GALAXY_TEST_USE_LEGACY_TOOL_API="${{ matrix.use-legacy-api }}" ./run_tests.sh --coverage --framework-tools -- --json-report --json-report-file=run_framework_tests.json
         working-directory: 'galaxy root'
       - uses: codecov/codecov-action@v5
         with:
@@ -73,3 +73,8 @@ jobs:
         with:
           name: Tool framework test results (${{ matrix.python-version }})
           path: 'galaxy root/run_framework_tests.html'
+      - uses: actions/upload-artifact@v5
+        if: failure()
+        with:
+          name: Tool framework test results JSON (${{ matrix.python-version }})
+          path: 'galaxy root/run_framework_tests.json'
diff --git a/.github/workflows/framework_workflows.yaml b/.github/workflows/framework_workflows.yaml
@@ -62,7 +62,7 @@ jobs:
           path: 'galaxy root/.venv'
           key: gxy-venv-${{ runner.os }}-${{ steps.full-python-version.outputs.version }}-${{ hashFiles('galaxy root/requirements.txt') }}-framework
       - name: Run tests
-        run: ./run_tests.sh --coverage --framework-workflows
+        run: ./run_tests.sh --coverage --framework-workflows -- --json-report --json-report-file=run_framework_workflows_tests.json
         working-directory: 'galaxy root'
       - uses: codecov/codecov-action@v5
         with:
@@ -73,3 +73,8 @@ jobs:
         with:
           name: Workflow framework test results (${{ matrix.python-version }})
           path: 'galaxy root/run_framework_workflows_tests.html'
+      - uses: actions/upload-artifact@v5
+        if: failure()
+        with:
+          name: Workflow framework test results JSON (${{ matrix.python-version }})
+          path: 'galaxy root/run_framework_workflows_tests.json'
diff --git a/.github/workflows/integration.yaml b/.github/workflows/integration.yaml
@@ -74,7 +74,7 @@ jobs:
       - name: Run tests
         run: |
           . .ci/minikube-test-setup/start_services.sh
-          ./run_tests.sh --coverage -integration test/integration -- --num-shards=4 --shard-id=${{ matrix.chunk }}
+          ./run_tests.sh --coverage -integration test/integration -- --num-shards=4 --shard-id=${{ matrix.chunk }} --json-report --json-report-file=run_integration_tests.json
         working-directory: 'galaxy root'
       - uses: codecov/codecov-action@v5
         with:
@@ -85,3 +85,8 @@ jobs:
         with:
           name: Integration test results (${{ matrix.python-version }}, ${{ matrix.chunk }})
           path: 'galaxy root/run_integration_tests.html'
+      - uses: actions/upload-artifact@v5
+        if: failure()
+        with:
+          name: Integration test results JSON (${{ matrix.python-version }}, ${{ matrix.chunk }})
+          path: 'galaxy root/run_integration_tests.json'