-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Background
Flaky tests in OpenSearch’s core ./gradlew check task frequently impact contributors and developers, and block pull requests. At the time this issue was created there are 130 Flaky tests (coming from the link).
Today the automation exists to detect and create GitHub issue with a comprehensive failure report — including links to impacted pull requests (contributors), relevant commits, the Gradle check build logs, and the full test report via the detector which leverage OpenSearch Metrics data.
For failed CI links or locate the failing test(s) developers often use the Gradle Check Metrics Dashboard dashboard.
However, detection is only one part of the solution. What’s missing is a repeatable process to drive these issues to closure and reduce recurring flakiness over time. While we have this automation today there is currently no structured mechanism to tackle, resolve, and close these issues. This leads to:
- Long-standing flaky test issues with no resolution.
- Contributors repeatedly impacted by the same failures.
- Contributors today either close and open the PR or push a new commit to retry the Gradle check.
- Maintainers manually retry the failed build.
Supporting References
Past GitHub Issues Related to this topic:
- Better visibility into test failures over time #11217
- Add additional details on Gradle Check failures autocut issues #13950
- [Automation Enhancement] Mechanism to close the created Gradle Check AUTOCUT flaky test issues. #14475
- Gradle Check Optimization #13786
- Add failure analytics for OpenSearch #3713
- [Meta] Fix random test failures #1715
Some mechanisms we can incorporate:
Retry mechanism for failed tests
- Currently, retrying specific Gradle tests is broken due to the transition to the Gradle Develocity plugin (coming from Update gradle config for develocity rebranding #13942). Although we’ve retained the Gradle Enterprise plugin (now rebranded as Develocity) from the original fork, we don’t use it since we lack the required license. As a result, the retry functionality tied to this plugin is not working as expected. I suspect this may be contributing to the recent instability. I’ve created a PR attempting to address the issue Retry All Tests with a Reduced Retry Limit of 1 #17939.
Reporting enhancement with seed information
-
With today’s automation that creates a GitHub issue containing a comprehensive failure report, I propose that we also include the
tests.seedinformation used during the failure. I've observed that some failures are genuine and reproducible with a specific seed — for example:./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testSnapshotWithStuckNode" -Dtests.seed=8529B1DD622216C1 ./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testSnapshotWithStuckNode" -Dtests.seed=AE568A72925374C5While this information is available in Jenkins logs, adding the seed directly to the failure report would make it easier to identify tests with reproducible failures.
-
Coming from above retry mechanism, I propose we should surface the tests that are passing with retry and fix them proactively. We should not keep retrying the tests to get them green.
Flaky Test Tracking (Ongoing)
- Periodically track
and keep the count down. We can leverage community meetings https://forum.opensearch.org/tag/proj-health-agenda.
Tag the test author
We can use the approach discussed here #17934 (comment).
Related Child Issue #18271
-
This could be ambitious, but may be we can have an automation to tag the flaky test owner? Using the OpenSearch Gradle Check Metrics Dashboard I was able to see the top 100 failing tests that are flaky since past one year. At least we can have a mechanism to tag the owners of top hitters.
-
To begin with we can try with git log cli to find the recent commit author and tag them in a comment on the flaky report issue? There could be other ways to do this but I'm open for thoughts.
Identify flaky test proactively
-
Before merging the PR: Similar to what we have today with benchmark workflow that runs on a request and approval, we can have a similar workflow where even before merging the PR we can target to run the tests
Ntimes in row with multiple combination to detect the flakiness. This gives us some confidence before we merge the PR (in this process we have to mute/exclude the old flaky tests to exactly determine the new failures). Related Issue Workflow to proactively run newly added/updated OpenSearch Core Gradle tests before PR merge opensearch-build#5481. -
After merging the PR (post-action): Today we run
gradle checkon a commit after the PR is merged, we should do this more often and periodically to find flaky tests and for an identified commit we can have an automation to create a PR reverting the commit. Similar topic discussed here [FEATURE] Introduce commit queue on Jenkins (mainbranch only) to proactively spot flaky tests opensearch-build#4810.
Related component
Build
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status