[chore] Re-run failed unit tests automatically #31253

mx-psi · 2024-02-14T09:20:30Z

Description:

Re-runs failed unit tests automatically. Follow up to #31163
This re-runs the tests once if there are less than 10 total test failures.

This should speed up development, but it comes with the risk of missing real issues.
I think given the current situation our CI is in this is acceptable, but I assume this PR is going to be controversial :)

One improvement would be to keep this but auto-generate Github issues when a test fails and then passes on main's CI.

Link to tracking Issue: Relates to #30880 (does not speed up individual tests but reduces the number of attempts to be made)

mx-psi · 2024-02-14T10:15:34Z

cc @open-telemetry/collector-contrib-approvers

jpkrohling

LGTM. Looking at the documentation, it looks like it will attempt two more times by default, which is reasonable.

djaglowski

I think it's good to rerun automatically given the state of CI. Tolerating failures is always a tradeoff but we currently have so many failures that it's difficult to separate the worst offenders from the 1/million flukes. Retrying is a great way to separate these so we can get the worst offenders under control. The question in my mind is whether we should retry twice or only once.

We should keep in mind that retrying twice means a test which fails 1% of the time has only a 1/million chance of failing a given test run. We run CI ~100 times per day, so a 1% failure rate test would show up maybe once per quarter. On the other hand, retrying only once means that we would see the failure a couple times per month, which (in a less noisy CI environment) seems often enough to notice and fix/skip/remove.

mx-psi · 2024-02-14T15:57:30Z

@djaglowski Your argument makes sense to me and it's possible that re-running once could be enough. I changed the option to --rerun-fails=1 on my last commit, we can start with this and revisit if needed in the future

**Description:** Re-runs failed unit tests automatically. Follow up to open-telemetry#31163 This re-runs the tests once if there are less than 10 total test failures. This should speed up development, but it comes with the risk of missing real issues. I think given the current situation our CI is in this is acceptable, but I assume this PR is going to be controversial :) One improvement would be to keep this but auto-generate Github issues when a test fails and then passes on main's CI. **Link to tracking Issue:** Relates to open-telemetry#30880 (does not speed up individual tests but reduces the number of attempts to be made)

[chore] Re-run failed unit tests automatically

f32561a

mx-psi marked this pull request as ready for review February 14, 2024 10:14

mx-psi requested review from a team and songy23 February 14, 2024 10:14

github-actions bot assigned evan-bradley Feb 14, 2024

jpkrohling approved these changes Feb 14, 2024

View reviewed changes

djaglowski reviewed Feb 14, 2024

View reviewed changes

Re-run failed unit tests only once

4ef1712

mx-psi requested a review from djaglowski February 14, 2024 15:58

djaglowski approved these changes Feb 14, 2024

View reviewed changes

songy23 approved these changes Feb 14, 2024

View reviewed changes

arminru added this pull request to the merge queue Feb 14, 2024

arminru removed this pull request from the merge queue due to a manual request Feb 14, 2024

arminru mentioned this pull request Feb 14, 2024

REQUEST: Repository maintenance on opentelemetry-collector-contrib open-telemetry/community#1936

Open

mx-psi added this pull request to the merge queue Feb 14, 2024

TylerHelmuth added the ready to merge Code review completed; ready to merge by maintainers label Feb 14, 2024

TylerHelmuth removed this pull request from the merge queue due to a manual request Feb 14, 2024

TylerHelmuth merged commit c1a65e9 into open-telemetry:main Feb 14, 2024
147 checks passed

github-actions bot added this to the next release milestone Feb 14, 2024

mx-psi mentioned this pull request Sep 19, 2024

[chore]: fix go routine leaks in tests #34729

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chore] Re-run failed unit tests automatically #31253

[chore] Re-run failed unit tests automatically #31253

mx-psi commented Feb 14, 2024 •

edited

Loading

mx-psi commented Feb 14, 2024

jpkrohling left a comment

djaglowski left a comment •

edited

Loading

mx-psi commented Feb 14, 2024

[chore] Re-run failed unit tests automatically #31253

[chore] Re-run failed unit tests automatically #31253

Conversation

mx-psi commented Feb 14, 2024 • edited Loading

mx-psi commented Feb 14, 2024

jpkrohling left a comment

Choose a reason for hiding this comment

djaglowski left a comment • edited Loading

Choose a reason for hiding this comment

mx-psi commented Feb 14, 2024

mx-psi commented Feb 14, 2024 •

edited

Loading

djaglowski left a comment •

edited

Loading