Skip to content

Remove flaky assertion in FullClusterRestartIT.testRecovery#20747

Open
andrross wants to merge 1 commit intoopensearch-project:mainfrom
andrross:fullclusterrestart-fix
Open

Remove flaky assertion in FullClusterRestartIT.testRecovery#20747
andrross wants to merge 1 commit intoopensearch-project:mainfrom
andrross:fullclusterrestart-fix

Conversation

@andrross
Copy link
Member

The test asserts that after a full cluster restart upgrade, at least one old version Lucene segment must still exist on the primary. This is not guaranteed because background merges can merge all old version segments into new segments before the assertion runs.

The remaining assertions still validate that translog replay occurred and produced at least one new-version segment on the primary.

Related Issues

Resolves #15813

Check List

  • Functionality includes testing.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

The test asserts that after a full cluster restart upgrade, at least one
old version Lucene segment must still exist on the primary. This is not
guaranteed because background merges can merge all old version segments
into new segments before the assertion runs.

The remaining assertions still validate that translog replay occurred
and produced at least one new-version segment on the primary.

Signed-off-by: Andrew Ross <andrross@amazon.com>
@andrross andrross requested a review from a team as a code owner February 27, 2026 20:37
@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Feb 27, 2026
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 83fc653.

PathLineSeverityDescription
qa/full-cluster-restart/src/test/java/org/opensearch/upgrades/FullClusterRestartIT.java796lowRemoval of a backward-compatibility assertion (`assertNotEquals` for `numBwcVersion`) in a full-cluster-restart upgrade test. This weakens test coverage by no longer verifying that at least one old-version segment exists after translog recovery. While this could be a legitimate fix for a flaky test or a change in expected upgrade behavior, silently dropping upgrade-validation assertions in a security-sensitive context (data integrity during cluster restart) is worth reviewing to confirm intent.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Reduced Coverage

Removing the assertion that validates at least one old-version (BWC) segment exists reduces test coverage for verifying that the upgrade scenario actually preserved old segments before merging. While the PR description explains this is flaky due to background merges, consider whether an alternative approach (e.g., disabling merges during the test or checking segment counts before merges occur) could preserve this validation without flakiness.

    assertNotEquals("expected at least 1 current segment after translog recovery. segments:\n" + segmentsResponse,
        0, numCurrentVersion);
}

@github-actions
Copy link
Contributor

✅ Gradle check result for 83fc653: SUCCESS

@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.33%. Comparing base (59be6ae) to head (83fc653).

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20747      +/-   ##
============================================
+ Coverage     73.29%   73.33%   +0.04%     
- Complexity    72088    72169      +81     
============================================
  Files          5794     5794              
  Lines        329733   329733              
  Branches      47577    47577              
============================================
+ Hits         241664   241809     +145     
+ Misses        68612    68597      -15     
+ Partials      19457    19327     -130     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run lucene skip-changelog >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for FullClusterRestartIT

2 participants