Skip to content

Fix VSR Rotation verification for failures#21837

Open
mgodwan wants to merge 2 commits into
opensearch-project:mainfrom
mgodwan:fix-vsr-rotation
Open

Fix VSR Rotation verification for failures#21837
mgodwan wants to merge 2 commits into
opensearch-project:mainfrom
mgodwan:fix-vsr-rotation

Conversation

@mgodwan
Copy link
Copy Markdown
Member

@mgodwan mgodwan commented May 26, 2026

Description

Fix VSR Rotation verification for failures

Related Issues

Resolves issue where for high doc ingestion, the vsr rotation fails due to exceptionNow returning an exception for even successfully completed future
Fixes past tests which didn't catch this due to moving the rotation on next doc rather than current doc

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
@mgodwan mgodwan marked this pull request as ready for review May 26, 2026 19:53
@mgodwan mgodwan requested a review from a team as a code owner May 26, 2026 19:53
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

PR Reviewer Guide 🔍

(Review updated until commit e2e0a77)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The writer field is now declared final but is never initialized in the constructor. This will cause a compilation error. The field must either be initialized in the constructor or have the final modifier removed.

private final NativeParquetWriter writer;
Incorrect Exception Handling

When pendingWrite.state() is FAILED, calling pendingWrite.exceptionNow() returns a Throwable, but it's being passed to IllegalStateException constructor which expects a String message. This will result in calling toString() on the exception rather than properly wrapping it. Use new IllegalStateException("message", pendingWrite.exceptionNow()) to preserve the exception chain.

if (pendingWrite != null && pendingWrite.isDone()) {
    Future.State state = pendingWrite.state();
    if (state == Future.State.FAILED) {
        throw new IllegalStateException(pendingWrite.exceptionNow());
    } else if (state == Future.State.CANCELLED) {
        throw new IllegalStateException("Background write was cancelled");
    }
}
Incorrect Parameter

The ParquetWriter constructor is called with 0L as the third argument, but this position previously held mappingVersion. Passing a hardcoded 0L instead of an actual mapping version may cause the writer to operate with stale or incorrect version information, potentially leading to schema synchronization issues.

Schema schema = getOrBuildSchema();
Path filePath = buildParquetFilePath(shardPath, config.writerGeneration(), null);
return new ParquetWriter(
    filePath.toString(),
    config.writerGeneration(),
    0L,
    dataFormat,
    schema,
    this::getOrBuildSchema,
    bufferPool,
    indexSettings,
    threadPool,
    checksumStrategy
Version Initialization

The fallback constructor DocumentMapper(MapperService, Mapping) initializes version to 0L, but the builder sets newVersion to existingMapper.getVersion() + 1L or 1L if no existing mapper. This inconsistency means documents created via the fallback constructor will have version 0, while those from the builder start at 1. If both paths are used, version comparisons may fail.

public DocumentMapper(MapperService mapperService, Mapping mapping) {
    this(mapperService, mapping, 0L);
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

PR Code Suggestions ✨

Latest suggestions up to e2e0a77

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for documentMapper

The documentMapper() call may return null, which would cause a NullPointerException
when calling getVersion(). Add a null check or use documentMapperWithAutoCreate() to
ensure a non-null mapper is returned before accessing its version.

server/src/main/java/org/opensearch/index/engine/DataFormatAwareEngine.java [1855]

 private long currentMappingVersion() {
-    return engineConfig.getMapperService().documentMapper().getVersion();
+    DocumentMapper mapper = engineConfig.getMapperService().documentMapper();
+    return mapper != null ? mapper.getVersion() : 0L;
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential NullPointerException if documentMapper() returns null. However, the improved code only asks to verify/ensure proper handling rather than fixing a critical bug, and the default value of 0L may not be the correct fallback in all contexts.

Medium
General
Simplify future state checking logic

The state() method call is redundant since isDone() already confirms completion.
Directly check exception status using exceptionNow() within a try-catch block to
handle both failed and cancelled states more efficiently, as exceptionNow() throws
IllegalStateException for non-failed states.

sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/vsr/VSRManager.java [135-142]

 if (pendingWrite != null && pendingWrite.isDone()) {
-    Future.State state = pendingWrite.state();
-    if (state == Future.State.FAILED) {
-        throw new IllegalStateException(pendingWrite.exceptionNow());
-    } else if (state == Future.State.CANCELLED) {
-        throw new IllegalStateException("Background write was cancelled");
+    try {
+        Throwable ex = pendingWrite.exceptionNow();
+        if (ex != null) {
+            throw new IllegalStateException(ex);
+        }
+    } catch (IllegalStateException e) {
+        if (pendingWrite.isCancelled()) {
+            throw new IllegalStateException("Background write was cancelled");
+        }
+        throw e;
     }
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion proposes using exceptionNow() within a try-catch block, but this approach is less clear than the current explicit state checking. The current code correctly handles both FAILED and CANCELLED states separately, which is more readable and maintainable.

Low

Previous suggestions

Suggestions up to commit cea1871
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for metadata

The method doesn't handle the case where newIndexMetadata is null, which could occur
if the index is deleted. This would cause a NullPointerException when calling
getMappingVersion(). Add a null check for newIndexMetadata before accessing its
methods.

server/src/main/java/org/opensearch/index/engine/DataFormatAwareEngine.java [2085-2092]

 @Override
 public synchronized void clusterChanged(ClusterChangedEvent event) {
     IndexMetadata newIndexMetadata = event.state().metadata().indices().get(this.config().getShardId().getIndexName());
     IndexMetadata oldIndexMetadata = event.previousState().metadata().indices().get(this.config().getShardId().getIndexName());
-    if (oldIndexMetadata == null || ClusterChangedEvent.indexMetadataChanged(oldIndexMetadata, newIndexMetadata)) {
+    if (newIndexMetadata != null && (oldIndexMetadata == null || ClusterChangedEvent.indexMetadataChanged(oldIndexMetadata, newIndexMetadata))) {
         logger.info("Mapping version from {} to {}", currentMappingVersion, newIndexMetadata.getMappingVersion());
         currentMappingVersion = newIndexMetadata.getMappingVersion();
     }
 }
Suggestion importance[1-10]: 8

__

Why: Critical null safety issue. If the index is deleted, newIndexMetadata could be null, causing a NullPointerException when calling getMappingVersion(). The suggestion correctly identifies this potential bug and provides the appropriate fix.

Medium
General
Use imported Collectors class

The code uses the fully qualified class name java.util.stream.Collectors when
Collectors is already imported at the top of the file. This is inconsistent with the
import statement and reduces code readability. Use the short form Collectors.joining
instead.

sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/vsr/VSRManager.java [165-172]

 logger.error(
     "[Gen: {}] VSR schema mismatch: field [{}] not in active VSR. VSR schema fields: {}",
     writerGeneration,
     fieldType.name(),
     activeVSR.getSchema().getFields().stream()
         .map(f -> f.getName())
-        .collect(java.util.stream.Collectors.joining(", "))
+        .collect(Collectors.joining(", "))
 );
Suggestion importance[1-10]: 5

__

Why: Valid style improvement. The Collectors class is imported at line 12, so using the fully qualified name java.util.stream.Collectors is inconsistent and reduces readability. This is a minor code quality improvement.

Low
Remove redundant isDone check

The check for pendingWrite.isDone() before calling state() is redundant since
state() can be called on any future. Additionally, checking only FAILED and
CANCELLED states misses the RUNNING state, which could lead to race conditions.
Consider checking the state directly and handling all non-success states
appropriately.

sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/vsr/VSRManager.java [135-142]

-if (pendingWrite != null && pendingWrite.isDone()) {
+if (pendingWrite != null) {
     Future.State state = pendingWrite.state();
     if (state == Future.State.FAILED) {
         throw new IllegalStateException(pendingWrite.exceptionNow());
     } else if (state == Future.State.CANCELLED) {
         throw new IllegalStateException("Background write was cancelled");
     }
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion correctly identifies that isDone() check is redundant before calling state(). However, the current code logic is correct - it only checks state when the future is done, which is intentional. The suggestion's claim about missing RUNNING state handling is incorrect since the code explicitly wants to check only completed futures.

Low
Suggestions up to commit 2c1ea8c
CategorySuggestion                                                                                                                                    Impact
General
Add timeout to assertBusy call

The assertBusy call lacks a timeout parameter, which could cause the test to hang
indefinitely if the background write never completes. Add an explicit timeout to
prevent test hangs and provide faster failure feedback.

sandbox/plugins/parquet-data-format/src/test/java/org/opensearch/parquet/vsr/VSRManagerTests.java [542-545]

 assertBusy(() -> {
     Future<?> f = manager.getPendingWrite();
     assertTrue("Background write should complete", f == null || f.isDone());
-});
+}, 30, TimeUnit.SECONDS);
Suggestion importance[1-10]: 7

__

Why: Adding an explicit timeout to assertBusy is a good practice for test reliability, preventing indefinite hangs and providing faster feedback when background writes fail to complete. This improves test maintainability.

Medium
Simplify future state check logic

The check for pendingWrite.isDone() before calling state() is redundant since
state() can be called on any future. Additionally, consider wrapping the exception
from exceptionNow() to preserve the original cause for better debugging.

sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/vsr/VSRManager.java [134-141]

-if (pendingWrite != null && pendingWrite.isDone()) {
+if (pendingWrite != null) {
     Future.State state = pendingWrite.state();
     if (state == Future.State.FAILED) {
-        throw new IllegalStateException(pendingWrite.exceptionNow());
+        throw new IllegalStateException("Background write failed", pendingWrite.exceptionNow());
     } else if (state == Future.State.CANCELLED) {
         throw new IllegalStateException("Background write was cancelled");
     }
 }
Suggestion importance[1-10]: 4

__

Why: While removing the isDone() check is technically valid since state() can be called on any future, the current code is more explicit and readable. The suggestion to wrap the exception with a cause is a minor improvement but not critical.

Low

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 2c1ea8c: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.38%. Comparing base (8175de0) to head (2c1ea8c).

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21837      +/-   ##
============================================
- Coverage     73.42%   73.38%   -0.04%     
+ Complexity    75450    75433      -17     
============================================
  Files          6034     6034              
  Lines        342533   342533              
  Branches      49263    49263              
============================================
- Hits         251508   251380     -128     
- Misses        71048    71135      +87     
- Partials      19977    20018      +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit cea1871

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for cea1871: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Mohit Godwani <mgodwan@amazon.com>
@mgodwan mgodwan force-pushed the fix-vsr-rotation branch from cea1871 to e2e0a77 Compare May 27, 2026 09:48
@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit e2e0a77.

PathLineSeverityDescription
server/src/main/java/org/opensearch/index/engine/DataFormatAwareEngine.java17lowClusterStateListener is imported but not used anywhere in the visible diff. Unused imports of sensitive cluster-state interfaces are anomalous and worth verifying that no listener registration was accidentally dropped or is planned in a follow-up commit.
sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/fields/ArrowSchemaBuilder.java52lowSwitched from documentMapper() to documentMapperWithAutoCreate().getDocumentMapper() for schema inspection. The auto-create variant can trigger implicit index/mapping creation as a side effect of what should be a read-only schema build, which may be unintended behavior in this context.
sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/engine/ParquetIndexingEngine.java262lowmappingVersion is hardcoded to 0L when constructing ParquetWriter, eliminating version-aware schema tracking. Combined with removing the version-gated cache in getOrBuildSchema(), this ensures the writer always starts at version 0, which could mask schema version drift between the writer and the live mapping state.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 0 | Low: 3


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e2e0a77

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for e2e0a77: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants