Skip to content

[META] Establish Mechanisms to Resolve Existing Flaky Tests and Prevent Future Flakiness #17974

@prudhvigodithi

Description

@prudhvigodithi

Background

Flaky tests in OpenSearch’s core ./gradlew check task frequently impact contributors and developers, and block pull requests. At the time this issue was created there are 130 Flaky tests (coming from the link).

Today the automation exists to detect and create GitHub issue with a comprehensive failure report — including links to impacted pull requests (contributors), relevant commits, the Gradle check build logs, and the full test report via the detector which leverage OpenSearch Metrics data.

For failed CI links or locate the failing test(s) developers often use the Gradle Check Metrics Dashboard dashboard.

However, detection is only one part of the solution. What’s missing is a repeatable process to drive these issues to closure and reduce recurring flakiness over time. While we have this automation today there is currently no structured mechanism to tackle, resolve, and close these issues. This leads to:

  • Long-standing flaky test issues with no resolution.
  • Contributors repeatedly impacted by the same failures.
  • Contributors today either close and open the PR or push a new commit to retry the Gradle check.
  • Maintainers manually retry the failed build.

Supporting References

Past GitHub Issues Related to this topic:

Some mechanisms we can incorporate:

Retry mechanism for failed tests

  • Currently, retrying specific Gradle tests is broken due to the transition to the Gradle Develocity plugin (coming from Update gradle config for develocity rebranding #13942). Although we’ve retained the Gradle Enterprise plugin (now rebranded as Develocity) from the original fork, we don’t use it since we lack the required license. As a result, the retry functionality tied to this plugin is not working as expected. I suspect this may be contributing to the recent instability. I’ve created a PR attempting to address the issue Retry All Tests with a Reduced Retry Limit of 1 #17939.

Reporting enhancement with seed information

  • With today’s automation that creates a GitHub issue containing a comprehensive failure report, I propose that we also include the tests.seed information used during the failure. I've observed that some failures are genuine and reproducible with a specific seed — for example:

          ./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testSnapshotWithStuckNode" -Dtests.seed=8529B1DD622216C1
    
          ./gradlew ':server:internalClusterTest' --tests "org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testSnapshotWithStuckNode" -Dtests.seed=AE568A72925374C5
    

    While this information is available in Jenkins logs, adding the seed directly to the failure report would make it easier to identify tests with reproducible failures.

  • Coming from above retry mechanism, I propose we should surface the tests that are passing with retry and fix them proactively. We should not keep retrying the tests to get them green.

Flaky Test Tracking (Ongoing)

Tag the test author

We can use the approach discussed here #17934 (comment).

Related Child Issue #18271

  • This could be ambitious, but may be we can have an automation to tag the flaky test owner? Using the OpenSearch Gradle Check Metrics Dashboard I was able to see the top 100 failing tests that are flaky since past one year. At least we can have a mechanism to tag the owners of top hitters.

  • To begin with we can try with git log cli to find the recent commit author and tag them in a comment on the flaky report issue? There could be other ways to do this but I'm open for thoughts.

Identify flaky test proactively

Related component

Build

Metadata

Metadata

Labels

BuildBuild Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement.CICI relatedMetaMeta issue, not directly linked to a PR

Type

No type

Projects

Status

New

Status

🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions