Skip to content

Comments

feat: Implement flaky test monitoring system#1379

Merged
KCarretto merged 22 commits intomainfrom
flaky-test-monitor-3414294539595760241
Dec 24, 2025
Merged

feat: Implement flaky test monitoring system#1379
KCarretto merged 22 commits intomainfrom
flaky-test-monitor-3414294539595760241

Conversation

@google-labs-jules
Copy link
Contributor

This PR implements a comprehensive system for monitoring flaky tests in the CI pipeline.

It introduces the following changes:

  1. Data Collection:

    • tavern (Go): Uses gotestsum to output JUnit XML.
    • implants (Rust): Configures nextest to output JUnit XML via cargo llvm-cov.
    • ui-tests (Node.js): Configures Vitest to use the junit reporter.
    • All jobs now upload their XML reports as artifacts.
  2. Analysis & Storage:

    • Adds a new flaky-monitor job that runs after all test jobs.
    • This job maintains a persistent history of test results in a dedicated orphan branch test-results.
    • It downloads current run artifacts and commits them to the history branch.
    • A new Python script scripts/analyze_flaky_tests.py parses the historical XML data to identify tests with high failure rates, grouped by OS.
  3. Reporting:

    • The analysis results are posted to the GitHub Actions Job Summary.
    • On Pull Requests, a comment is automatically posted listing the top 5 flaky tests.

This system allows the team to track test reliability over time and identify specific environments (OS) where tests are unstable.


PR created automatically by Jules for task 3414294539595760241 started by @KCarretto

- Update `tests.yml` to generate JUnit XML reports for Go, Rust, and Node.js tests.
- Add `flaky-monitor` job to aggregate and analyze historical test results.
- Create `scripts/analyze_flaky_tests.py` to calculate test flakiness grouped by OS.
- Configure automatic reporting to GitHub Step Summary and PR comments.
- Use `test-results` branch for persistent history storage.
@google-labs-jules
Copy link
Contributor Author

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!


For security, I will only act on instructions from the user who triggered this task.

New to Jules? Learn more at jules.google/docs.

@github-actions
Copy link
Contributor

🛡️ Test Health Report

No flaky tests detected in this batch! 🎉

google-labs-jules bot and others added 8 commits December 23, 2025 19:33
- Update `tests.yml` to generate CTRF reports for Go, Rust, and Node.js tests.
- Use `go-ctrf-json-reporter` for Go tests.
- Use `junit-to-ctrf` for Rust and Node.js tests (converting JUnit XML).
- Add `flaky-monitor` job using `ctrf-io/github-test-reporter` to track and report flaky tests.
- Configure historical data tracking via artifact retention.
- Update `tests.yml` to generate CTRF reports for Go, Rust, and Node.js tests.
- Use `go-ctrf-json-reporter` for Go tests.
- Use `junit-to-ctrf` for Rust and Node.js tests (converting JUnit XML).
- Add `flaky-monitor` job using `ctrf-io/github-test-reporter` to track and report flaky tests.
- Configure historical data tracking via artifact retention.
- Fix argument passing for `cargo llvm-cov` by using `--` separator.
- Update `tests.yml` to generate CTRF reports for Go, Rust, and Node.js tests.
- Use `go-ctrf-json-reporter` for Go tests.
- Use `junit-to-ctrf` for Rust and Node.js tests (converting JUnit XML).
- Add `flaky-monitor` job using `ctrf-io/github-test-reporter` to track and report flaky tests.
- Configure historical data tracking via artifact retention.
- Fix argument passing for `cargo llvm-cov` by configuring `nextest.toml` for JUnit output.
- Update `tests.yml` to generate CTRF reports for Go, Rust, and Node.js tests.
- Use `go-ctrf-json-reporter` for Go tests.
- Use `junit-to-ctrf` for Rust and Node.js tests (converting JUnit XML).
- Add `flaky-monitor` job using `ctrf-io/github-test-reporter` to track and report flaky tests.
- Configure historical data tracking via artifact retention.
- Fix argument passing for `cargo llvm-cov` by configuring `nextest.toml` for JUnit output.
- Fix PATH issues for `go-ctrf-json-reporter` by explicitly adding GOPATH/bin to GITHUB_PATH.
- Ensure `npx` is available in Rust job by adding `setup-node` step.
- Update `tests.yml` to generate CTRF reports for Go, Rust, and Node.js tests.
- Use `go-ctrf-json-reporter` for Go tests.
- Use `junit-to-ctrf` for Rust and Node.js tests (converting JUnit XML).
- Add `flaky-monitor` job using `ctrf-io/github-test-reporter` to track and report flaky tests.
- Configure historical data tracking via artifact retention.
- Fix argument passing for `cargo llvm-cov` by configuring `nextest.toml` for JUnit output.
- Fix PATH issues for `go-ctrf-json-reporter` by calling binary with full path.
- Ensure `npx` is available in Rust job by adding `setup-node` step.
- Update `tests.yml` to generate CTRF reports for Go, Rust, and Node.js tests.
- Use `go-ctrf-json-reporter` for Go tests.
- Use `junit-to-ctrf` for Rust and Node.js tests (converting JUnit XML).
- Add `flaky-monitor` job using `ctrf-io/github-test-reporter` to track and report flaky tests.
- Configure historical data tracking via artifact retention.
- Fix argument passing for `cargo llvm-cov` by configuring `nextest.toml` for JUnit output.
- Fix PATH issues for `go-ctrf-json-reporter` by calling binary with full path.
- Ensure `npx` is available in Rust job by adding `setup-node` step.
- Pin `gotestsum` to v1.12.0 to support Go 1.23.4.
- Update `tests.yml` to generate CTRF reports for Go, Rust, and Node.js tests.
- Use `go-ctrf-json-reporter` for Go tests.
- Use `junit-to-ctrf` for Rust and Node.js tests (converting JUnit XML).
- Add `flaky-monitor` job using `ctrf-io/github-test-reporter` to track and report flaky tests.
- Configure historical data tracking via artifact retention.
- Fix argument passing for `cargo llvm-cov` by configuring `nextest.toml` for JUnit output.
- Fix PATH issues for `go-ctrf-json-reporter` by calling binary with full path.
- Ensure `npx` is available in Rust job by adding `setup-node` step.
- Pin `gotestsum` to v1.12.0 to support Go 1.23.4.
- Use `shell: bash` explicitly for steps using bash syntax to prevent PowerShell errors on Windows runners.
@KCarretto KCarretto marked this pull request as ready for review December 24, 2025 00:41
- Update `tests.yml` to generate CTRF reports for Go, Rust, and Node.js tests.
- Use `go-ctrf-json-reporter` for Go tests.
- Use `junit-to-ctrf` for Rust and Node.js tests (converting JUnit XML).
- Add `flaky-monitor` job using `ctrf-io/github-test-reporter` to track and report flaky tests.
- Configure historical data tracking via artifact retention.
- Fix argument passing for `cargo llvm-cov` by configuring `nextest.toml` for JUnit output.
- Fix PATH issues for `go-ctrf-json-reporter` by calling binary with full path.
- Ensure `npx` is available in Rust job by adding `setup-node` step.
- Pin `gotestsum` to v1.12.0 to support Go 1.23.4.
- Use `shell: bash` explicitly for steps using bash syntax to prevent PowerShell errors on Windows runners.
- Revert unintended version downgrade in `.devcontainer/devcontainer.json`.
```bash
cd realm/implants/imix/
# Tell the linker to use the MacOSX.sdk
export SDKROOT="/MacOSX.sdk/"; export RUSTFLAGS="-Clink-arg=-isysroot -Clink-arg=/MacOSX.sdk -Clink-arg=-F/MacOSX.sdk/System/Library/Frameworks -Clink-arg=-L/MacOSX.sdk/usr/lib -Clink-arg=-lresolv"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remove this from our user guide

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I will restore this line in the documentation.

google-labs-jules bot and others added 4 commits December 24, 2025 01:18
- Update `tests.yml` to generate CTRF reports for Go, Rust, and Node.js tests.
- Use `go-ctrf-json-reporter` for Go tests.
- Use `junit-to-ctrf` for Rust and Node.js tests (converting JUnit XML).
- Add `flaky-monitor` job using `ctrf-io/github-test-reporter` to track and report flaky tests.
- Configure historical data tracking via artifact retention.
- Fix argument passing for `cargo llvm-cov` by configuring `nextest.toml` for JUnit output.
- Fix PATH issues for `go-ctrf-json-reporter` by calling binary with full path.
- Ensure `npx` is available in Rust job by adding `setup-node` step.
- Pin `gotestsum` to v1.12.0 to support Go 1.23.4.
- Use `shell: bash` explicitly for steps using bash syntax to prevent PowerShell errors on Windows runners.
- Revert unintended version downgrade in `.devcontainer/devcontainer.json`.
- Restore missing documentation in `docs/_docs/user-guide/imix.md`.
- Update `tests.yml` to generate CTRF reports for Go, Rust, and Node.js tests.
- Use `go-ctrf-json-reporter` for Go tests.
- Use `junit-to-ctrf` for Rust and Node.js tests (converting JUnit XML).
- Add `flaky-monitor` job using `ctrf-io/github-test-reporter` to track and report flaky tests with comprehensive reporting options enabled.
- Configure historical data tracking via artifact retention.
- Fix argument passing for `cargo llvm-cov` by configuring `nextest.toml` for JUnit output.
- Fix PATH issues for `go-ctrf-json-reporter` by calling binary with full path.
- Ensure `npx` is available in Rust job by adding `setup-node` step.
- Pin `gotestsum` to v1.12.0 to support Go 1.23.4.
- Use `shell: bash` explicitly for steps using bash syntax to prevent PowerShell errors on Windows runners.
- Revert unintended version downgrade in `.devcontainer/devcontainer.json`.
- Restore missing documentation in `docs/_docs/user-guide/imix.md`.
- Correctly locate `nextest.toml` in `.config/` directory.
@spellshift spellshift deleted a comment from github-actions bot Dec 24, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 24, 2025

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
2422    ↑2037 2422    ↑2038 0    ↓1 0    ±0 0    ±0 0    ±0 26.8s    ↑742ms

Previous Results

Build 🏗️ Result 🧪 Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
#3232 2422 2422 0 0 0 0 26.8s
#3228 385 384 1 0 0 0 26.1s
#3228 2807 2806 1 0 0 0 26.1s
#3220 1905 1904 1 0 0 0 26.5s
#3216 2390 2390 0 0 0 0 26.4s
#3209 2390 2390 0 0 0 0 490708h 59m
#3206 1169 1168 1 0 0 0 490708h 38m
#3206 3559 3558 1 0 0 0 490708h 38m
#3198 370 370 0 0 0 0 490707h 45m
#3196 370 370 0 0 0 0 490707h 42m
#3192 370 370 0 0 0 0 490707h 25m
#3182 349 349 0 0 0 0 490706h 32m
#3177 349 349 0 0 0 0 490705h 56m
#3176 349 349 0 0 0 0 490705h 22m
#3173 349 349 0 0 0 0 490704h 58m
#3167 349 349 0 0 0 0 490704h 25m
#3164 349 349 0 0 0 0 490703h 32m
#3161 56 56 0 0 0 0 not captured
#3158 56 56 0 0 0 0 not captured
#3155 56 56 0 0 0 0 not captured
#3154 56 56 0 0 0 0 not captured
#3153 56 56 0 0 0 0 not captured
#3152 56 56 0 0 0 0 not captured

Insights

Average Tests per Run Total Flaky Tests Total Failed Slowest Test (p95)
894 0 5 43.5s

Fail Rate

Fail Rate 0.02%    ↓0.04
Test 📝 Results 📊 Passed ✅ Failed ❌ Fail Rate (%) 📈
eldritch: sys::dll_inject_impl::tests::test_dll_inject_simple 6 5 1 16.67    ↓3.33
eldritch: process::kill_impl::tests::test_process_kill 22 20 2 9.09    ↓1.44
eldritch: process::kill_impl::tests::test_process_kill 22 20 2 9.09    ↓1.44
eldritch: process::kill_impl::tests::test_process_kill 22 20 2 9.09    ↓1.44
eldritch: assets::copy_impl::tests::test_embedded_copy 24 22 2 8.33    ↓1.19
eldritch: assets::copy_impl::tests::test_embedded_copy 24 22 2 8.33    ↓1.19
eldritch: assets::copy_impl::tests::test_embedded_copy 24 22 2 8.33    ↓1.19

Failed Tests

No failed tests ✨

Slowest Tests

Test 📝 Results 📊 Duration (avg) ⏱️ Duration (p95) ⏱️
eldritch: random::string_impl::tests::test_string_uniform 20 40.3s 43.5s
eldritch: random::string_impl::tests::test_string_uniform 20 40.3s 43.5s
eldritch: random::string_impl::tests::test_string_uniform 20 40.3s 43.5s
eldritch: pivot::port_scan_impl::tests::test_portscan_return_type_starlark_dict_from_interpreter 22 1.5s 7.9s
eldritch: pivot::port_scan_impl::tests::test_portscan_return_type_starlark_dict_from_interpreter 22 1.5s 7.9s
eldritch: pivot::port_scan_impl::tests::test_portscan_return_type_starlark_dict_from_interpreter 22 1.5s 7.9s
eldritch: pivot::ssh_copy_impl::tests::test_pivot_ssh_copy 22 2.8s 7.9s
eldritch: pivot::ssh_copy_impl::tests::test_pivot_ssh_copy 22 2.8s 7.9s
eldritch: pivot::ssh_copy_impl::tests::test_pivot_ssh_copy 22 2.8s 7.9s
eldritch: pivot::ssh_exec_impl::tests::test_pivot_ssh_exec 22 2.8s 7.8s

🍂 No flaky tests detected across all runs. | ⏱️ Measured over 23 runs.

Github Test Reporter by CTRF 💚

🔄 This comment has been updated

@KCarretto KCarretto requested review from KCarretto and hulto December 24, 2025 03:45
hulto
hulto previously approved these changes Dec 24, 2025
The `implants` test job was failing to generate CTRF reports because the `junit.xml` output from `cargo nextest` was located in a subdirectory (e.g., `target/nextest/default/junit.xml`) rather than the workspace root. This caused the `mv` command to fail, skipping the conversion step and ultimately causing the artifact upload to fail with "No files were found".

This change updates the workflow to dynamically search for `junit.xml` using `find` (piped to `head -n 1` for macOS compatibility), ensuring the report is correctly located and processed regardless of the output directory structure. This also restores the `implants/.config/nextest.toml` configuration to ensure JUnit generation is enabled.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@KCarretto KCarretto requested a review from hulto December 24, 2025 05:22
hulto
hulto previously approved these changes Dec 24, 2025
@KCarretto KCarretto merged commit 199df53 into main Dec 24, 2025
7 checks passed
@KCarretto KCarretto deleted the flaky-test-monitor-3414294539595760241 branch December 24, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants