Skip to content

Conversation

@RS146BIJAY
Copy link
Contributor

Description

Fix CriteriaBasedCodec to work with delegate codec

Related Issues

Fix CriteriaBasedCodec to work with delegate codec

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

This PR refactors the codec architecture by introducing CriteriaBasedPostingsFormat as a new SPI provider to handle postings format with bucket-level attribute tracking. CriteriaBasedCodec is simplified to delegate postings format handling while unregistered from the Codec SPI. Tests validate the new grouping functionality across multiple scenarios.

Changes

Cohort / File(s) Summary
Codec Architecture Refactor
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java, server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java
CriteriaBasedCodec simplified to add public postingsFormat() override and ATTRIBUTE_BINDING_TARGET_FIELD constant; removed segmentInfoFormat and docValuesFormat customizations. New CriteriaBasedPostingsFormat class implements PostingsFormat with bucket-level attribute tracking, delegate codec wrapping, and specialized FieldsConsumer for merge-aware bucket propagation.
Service Provider Configuration
server/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec, server/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat
Removed CriteriaBasedCodec from Codec SPI registry; added CriteriaBasedPostingsFormat to PostingsFormat SPI registry.
Test Coverage
server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java, server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java, test/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java
New CriteriaBasedPostingsFormatTests class validates indexing, bucket attribute assignment, and null bucket placeholder behavior. InternalEngineTests adds testCriteriaBasedGrouping() test and helper method. EngineTestCase.testContextSpecificDocument() signature updated to accept groupingCriteria parameter.
Documentation
CHANGELOG.md
Added changelog entry documenting CriteriaBasedCodec delegate codec compatibility fix.

Sequence Diagram(s)

sequenceDiagram
    participant Writer as IndexWriter
    participant CriteriaBasedPostingsFormat as CriteriaBasedPostingsFormat
    participant FieldsConsumer as CriteriaBasedFieldsConsumer
    participant DelegateFormat as Delegate PostingsFormat
    participant SegmentState as SegmentWriteState

    Writer->>CriteriaBasedPostingsFormat: fieldsConsumer(state)
    CriteriaBasedPostingsFormat->>DelegateFormat: fieldsConsumer(state)
    DelegateFormat-->>FieldsConsumer: FieldsConsumer instance
    CriteriaBasedPostingsFormat->>FieldsConsumer: wrap delegate
    FieldsConsumer->>FieldsConsumer: setAttributes on segment<br/>(bucket, delegate codec)
    FieldsConsumer->>FieldsConsumer: set bucket on _id field<br/>if bucket present
    FieldsConsumer->>DelegateFormat: write(fields)
    DelegateFormat-->>FieldsConsumer: success
    FieldsConsumer->>SegmentState: recordAttribute(DELEGATE_CODEC_KEY)
Loading
sequenceDiagram
    participant Reader as IndexReader
    participant CriteriaBasedPostingsFormat as CriteriaBasedPostingsFormat
    participant SegmentState as SegmentReadState
    participant CodecRegistry as Codec Registry
    participant DelegateFormat as Delegate PostingsFormat

    Reader->>CriteriaBasedPostingsFormat: fieldsProducer(state)
    CriteriaBasedPostingsFormat->>SegmentState: readAttribute(DELEGATE_CODEC_KEY)
    SegmentState-->>CriteriaBasedPostingsFormat: codec name
    CriteriaBasedPostingsFormat->>CodecRegistry: lookup(codec name)
    CodecRegistry-->>CriteriaBasedPostingsFormat: PostingsFormat
    CriteriaBasedPostingsFormat->>DelegateFormat: fieldsProducer(state)
    DelegateFormat-->>CriteriaBasedPostingsFormat: FieldsProducer instance
    CriteriaBasedPostingsFormat-->>Reader: FieldsProducer
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

lucene

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description lacks substantive detail about what the fix achieves, missing explanation of architectural changes, and the Related Issues section duplicates the title without referencing a specific GitHub issue. Expand the description to explain the delegate codec pattern, why it's needed, and what problem it solves. Update Related Issues to reference the actual GitHub issue number rather than repeating the title.
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main fix implemented across the changeset: restructuring CriteriaBasedCodec to work with a delegate codec pattern.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.5)
server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: RS146BIJAY <rishavsagar4b1@gmail.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In
`@server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java`:
- Around line 118-130: In merge(MergeState mergeState, NormsProducer norms) the
code assumes mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD)
is non-null which can cause an NPE; instead iterate over mergeState.fieldInfos
to find the first FieldInfos instance whose
fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) is non-null, read its
getAttribute(BUCKET_NAME) with a null-check, and only then call
mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute) and
mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME,
attribute); if no non-null fieldInfo is found, skip setting the attribute.
Ensure you still call delegateFieldsConsumer.merge(mergeState, norms) as before.

In
`@server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java`:
- Around line 41-69: The test creates a ByteBuffersDirectory (Directory dir =
new ByteBuffersDirectory()) but never closes it; wrap the Directory in a
try-with-resources that encloses the IndexWriter and IndexReader usage in
testBasicFunctionality (and apply the same change to the other three test
methods) so the Directory is closed after the reader; ensure the IndexReader is
opened/closed inside the Directory try block (or nest the reader/writer
try-with-resources inside the Directory try-with-resources) to guarantee the
reader is closed before the Directory is closed.

In `@server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java`:
- Around line 3816-3853: In testCriteriaBasedGrouping the test reuses the same
document id ("1") for every iteration which causes older grouping buckets to be
removed/merged and makes the final bucket assertion flaky; update the loop in
testCriteriaBasedGrouping so each ParsedDocument created by testParsedDocument
has a unique _id (for example append the loop index i or the groupingCriteria to
the id) when calling engine.index(indexForDoc(doc)) so that live docs exist in
each bucket and the Set<String> attributes assertion reliably contains all three
grouping criteria.
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)

77-80: Fail fast when DELEGATE_CODEC_KEY is missing (asserts are off in prod).

Line 79 relies on assert, which won’t run in production; a missing attribute will instead fail later with a less actionable error. Consider an explicit null check with a clear exception.

♻️ Suggested refactor
-        assert state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY) != null;
-        return PostingsFormat.forName(state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY)).fieldsProducer(state);
+        String delegateName = state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY);
+        if (delegateName == null) {
+            throw new IllegalStateException(
+                "Missing " + DELEGATE_CODEC_KEY + " in segmentInfo for " + state.segmentInfo.name
+            );
+        }
+        return PostingsFormat.forName(delegateName).fieldsProducer(state);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d145d6b and 7e949cb.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java
  • server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java
  • server/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec
  • server/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat
  • server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java
  • server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java
  • test/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java
💤 Files with no reviewable changes (1)
  • server/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.

Applied to files:

  • server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java
  • server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java
  • server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.

Applied to files:

  • server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java
  • server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java
🧬 Code graph analysis (3)
server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)
  • CriteriaBasedPostingsFormat (32-145)
server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java (2)
server/src/main/java/org/opensearch/index/CriteriaBasedMergePolicy.java (1)
  • CriteriaBasedMergePolicy (30-82)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)
  • CriteriaBasedCodec (20-49)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)
  • CriteriaBasedCodec (20-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: gradle-check
  • GitHub Check: Mend Security Check
🔇 Additional comments (9)
CHANGELOG.md (1)

35-35: Changelog entry looks good.

Captures the fix in the correct section.

server/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat (1)

3-3: Provider registration is correct.

This enables SPI discovery for the new postings format.

test/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java (1)

369-372: Good update to use the provided grouping criteria.

Keeps the helper aligned with the new tests.

server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java (5)

130-130: No review comment.


214-214: No review comment.


3855-3857: LGTM.
Helper keeps the merge policy setup tidy.


8741-8743: LGTM.
Context-specific documents align with context-aware settings in this test.


8786-8788: LGTM.
Context-specific documents align with context-aware settings in this test.

server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)

31-47: Verify behavior when the delegate postings format is not per-field.

Line 34 only wraps CriteriaBasedPostingsFormat when the delegate is a PerFieldPostingsFormat. If a plugin supplies a non‑per‑field postings format, bucket attributes won’t be written and criteria grouping is effectively disabled. Please confirm this invariant or consider wrapping the non‑per‑field format as well.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +118 to +130
@Override
public void merge(MergeState mergeState, NormsProducer norms) throws IOException {
delegateFieldsConsumer.merge(mergeState, norms);
Set<String> mergeFieldNames = StreamSupport.stream(mergeState.mergeFieldInfos.spliterator(), false)
.map(FieldInfo::getName)
.collect(Collectors.toSet());
if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) {
String attribute = mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).getAttribute(BUCKET_NAME);
assert attribute != null : "Attribute should not be null during merging segment";
mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute);

mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "CriteriaBasedPostingsFormat.java" -type f

Repository: opensearch-project/OpenSearch

Length of output: 155


🏁 Script executed:

wc -l ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java

Repository: opensearch-project/OpenSearch

Length of output: 159


🏁 Script executed:

cat -n ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java

Repository: opensearch-project/OpenSearch

Length of output: 7396


🏁 Script executed:

# Search for other uses of mergeState.fieldInfos to understand patterns
rg "mergeState\.fieldInfos" --type java -B 2 -A 2

Repository: opensearch-project/OpenSearch

Length of output: 1061


🏁 Script executed:

# Search for how fieldInfo() method is used in the codebase
rg "\.fieldInfo\(" --type java -B 1 -A 1 | head -50

Repository: opensearch-project/OpenSearch

Length of output: 6032


🏁 Script executed:

# Search for tests related to CriteriaBasedPostingsFormat to understand expected behavior
find . -name "*CriteriaBasedPostingsFormat*Test*" -type f

Repository: opensearch-project/OpenSearch

Length of output: 160


🏁 Script executed:

cat -n ./server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java

Repository: opensearch-project/OpenSearch

Length of output: 6445


🏁 Script executed:

# Search for documentation or code about parent-writer or tombstone segments
rg -i "parent.writer|tombstone|parent writer" --type java | head -30

Repository: opensearch-project/OpenSearch

Length of output: 4410


🏁 Script executed:

# Search for ATTRIBUTE_BINDING_TARGET_FIELD definition and usage
rg "ATTRIBUTE_BINDING_TARGET_FIELD" --type java -B 1 -A 1

Repository: opensearch-project/OpenSearch

Length of output: 2795


🏁 Script executed:

# Look at CriteriaBasedCodec to understand context
cat -n ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java

Repository: opensearch-project/OpenSearch

Length of output: 2091


🏁 Script executed:

# Search for how mergeState is typically used in other codecs
find . -name "*.java" -path "*/codec/*" -type f | head -10

Repository: opensearch-project/OpenSearch

Length of output: 880


🏁 Script executed:

cat -n ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedDocValuesWriter.java | head -100

Repository: opensearch-project/OpenSearch

Length of output: 3146


🏁 Script executed:

sed -n '94,116p' ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java

Repository: opensearch-project/OpenSearch

Length of output: 1514


🏁 Script executed:

# Check if there are any merge tests or scenarios that might show this issue
rg "merge" ./server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java

Repository: opensearch-project/OpenSearch

Length of output: 55


🏁 Script executed:

# Search for how other FieldsConsumer implementations handle merge
rg "class.*FieldsConsumer" --type java -A 20 | grep -A 15 "merge" | head -40

Repository: opensearch-project/OpenSearch

Length of output: 55


🏁 Script executed:

# Let's verify the pattern: check if fieldInfo() can return null by looking at Lucene API expectations
# Search for null checks after fieldInfo() calls
rg "fieldInfo\([^)]+\)\s*==" --type java | head -20

Repository: opensearch-project/OpenSearch

Length of output: 243


NPE when the first merged segment lacks _id field.

Line 125 reads the bucket attribute from mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) without null-checking the result. If the first segment is a parent-writer or tombstone segment without the _id field (as the comments in the write() method acknowledge is possible), fieldInfo() returns null, and calling .getAttribute() causes an NPE. This is not protected by the assertion on line 126 since assertions are disabled in production.

Scan the merge inputs for the first segment that actually contains the _id field rather than assuming the first segment has it.

🐛 Proposed fix
-            if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) {
-                String attribute = mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).getAttribute(BUCKET_NAME);
-                assert attribute != null : "Attribute should not be null during merging segment";
+            if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) {
+                String attribute = null;
+                for (int i = 0; i < mergeState.fieldInfos.length; i++) {
+                    FieldInfo idInfo = mergeState.fieldInfos[i].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD);
+                    if (idInfo != null) {
+                        attribute = idInfo.getAttribute(BUCKET_NAME);
+                        if (attribute != null) {
+                            break;
+                        }
+                    }
+                }
+                if (attribute == null) {
+                    throw new IllegalStateException(
+                        "Bucket attribute missing during merge for " + ATTRIBUTE_BINDING_TARGET_FIELD
+                    );
+                }
                 mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute);
 
                 mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute);
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Override
public void merge(MergeState mergeState, NormsProducer norms) throws IOException {
delegateFieldsConsumer.merge(mergeState, norms);
Set<String> mergeFieldNames = StreamSupport.stream(mergeState.mergeFieldInfos.spliterator(), false)
.map(FieldInfo::getName)
.collect(Collectors.toSet());
if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) {
String attribute = mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).getAttribute(BUCKET_NAME);
assert attribute != null : "Attribute should not be null during merging segment";
mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute);
mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute);
}
`@Override`
public void merge(MergeState mergeState, NormsProducer norms) throws IOException {
delegateFieldsConsumer.merge(mergeState, norms);
Set<String> mergeFieldNames = StreamSupport.stream(mergeState.mergeFieldInfos.spliterator(), false)
.map(FieldInfo::getName)
.collect(Collectors.toSet());
if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) {
String attribute = null;
for (int i = 0; i < mergeState.fieldInfos.length; i++) {
FieldInfo idInfo = mergeState.fieldInfos[i].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD);
if (idInfo != null) {
attribute = idInfo.getAttribute(BUCKET_NAME);
if (attribute != null) {
break;
}
}
}
if (attribute == null) {
throw new IllegalStateException(
"Bucket attribute missing during merge for " + ATTRIBUTE_BINDING_TARGET_FIELD
);
}
mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute);
mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute);
}
🤖 Prompt for AI Agents
In
`@server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java`
around lines 118 - 130, In merge(MergeState mergeState, NormsProducer norms) the
code assumes mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD)
is non-null which can cause an NPE; instead iterate over mergeState.fieldInfos
to find the first FieldInfos instance whose
fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) is non-null, read its
getAttribute(BUCKET_NAME) with a null-check, and only then call
mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute) and
mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME,
attribute); if no non-null fieldInfo is found, skip setting the attribute.
Ensure you still call delegateFieldsConsumer.merge(mergeState, norms) as before.

Comment on lines +41 to +69
public void testBasicFunctionality() throws IOException {
Directory dir = new ByteBuffersDirectory();
IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());
iwc.setCodec(criteriaBasedPostingFormat);

try (IndexWriter writer = new IndexWriter(dir, iwc)) {
for (int i = 0; i < 100; i++) {
Document doc = new Document();
doc.add(new StringField("_id", "doc" + i, Field.Store.YES));
doc.add(new TextField("field", "value" + i, Field.Store.YES));
writer.addDocument(doc);
}
}

try (IndexReader reader = DirectoryReader.open(dir)) {
assertEquals(100, reader.numDocs());

Terms terms = MultiTerms.getTerms(reader, "field");
assertNotNull(terms);
TermsEnum termsEnum = terms.iterator();

int count = 0;
BytesRef term;
while ((term = termsEnum.next()) != null) {
assertTrue(term.utf8ToString().startsWith("value"));
count++;
}
assertEquals(100, count);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "CriteriaBasedPostingsFormatTests.java" | head -5

Repository: opensearch-project/OpenSearch

Length of output: 160


🏁 Script executed:

# List test files in the codec directory
fd -t f ".*Tests\.java$" server/src/test/java/org/opensearch/index/codec/

Repository: opensearch-project/OpenSearch

Length of output: 1725


🏁 Script executed:

# Get file size first
wc -l server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java

Repository: opensearch-project/OpenSearch

Length of output: 162


🏁 Script executed:

cat -n server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java

Repository: opensearch-project/OpenSearch

Length of output: 6445


🌐 Web query:

Lucene ByteBuffersDirectory AutoCloseable resource management

💡 Result:

Short answer: ByteBuffersDirectory implements Closeable/AutoCloseable and must be closed to free its heap ByteBuffers — use try-with-resources or call close() when you’re done. [1][2]

Key points

  • ByteBuffersDirectory implements Closeable/AutoCloseable and exposes close() which throws IOException, so it participates in Java's try-with-resources. [1]
  • Directory (and BaseDirectory) semantics expect callers to close the Directory when finished; do not close it while an IndexWriter/IndexReader/IndexSearcher is still using it (closing will make the Directory unusable). [2]

Minimal example (conceptual)

  • try (Directory dir = new ByteBuffersDirectory()) { /* use IndexWriter/Readers here */ } // dir.close() called automatically.

Sources
[1] ByteBuffersDirectory Javadoc (implements Closeable/AutoCloseable, has close()).
[2] Directory / BaseDirectory Javadoc (Directory is Closeable; close() semantics).


🌐 Web query:

Lucene DirectoryReader close directory resource leak

💡 Result:

Yes — if you open a DirectoryReader and do not close it (or properly decRef it), you will leak file handles and other resources. Fixes:

  • Always close the reader when you’re done (try-with-resources or finally). DirectoryReader implements Closeable. [1]
  • In long-running apps use SearcherManager (or your own ref-counting) to safely reuse/reopen readers without closing readers still in use. DirectoryReader.openIfChanged returns a new reader but does NOT close the old one — you must close it when safe. [1][2]
  • If you manually share readers, use incRef()/decRef() (or ensure only one owner closes) to avoid closing while others still use it. [1]

Minimal example (concept):
try (DirectoryReader r = DirectoryReader.open(dir)) {
IndexSearcher s = new IndexSearcher(r);
// use searcher
} // reader closed automatically

References:
[1] Lucene DirectoryReader javadoc (notes on close, openIfChanged, SearcherManager).
[2] StackOverflow discussion: need to close DirectoryReader / openIfChanged behavior.


Close Directory instances to avoid resource leaks in tests.

All four test methods create ByteBuffersDirectory instances without closing them. ByteBuffersDirectory implements Closeable and must be closed to free heap resources. Wrap the directory in try-with-resources, ensuring the IndexReader closes before the directory:

♻️ Suggested fix (apply to all four test methods)
-        Directory dir = new ByteBuffersDirectory();
-        IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());
-        iwc.setCodec(criteriaBasedPostingFormat);
-
-        try (IndexWriter writer = new IndexWriter(dir, iwc)) {
-            for (int i = 0; i < 100; i++) {
-                Document doc = new Document();
-                doc.add(new StringField("_id", "doc" + i, Field.Store.YES));
-                doc.add(new TextField("field", "value" + i, Field.Store.YES));
-                writer.addDocument(doc);
-            }
-        }
-
-        try (IndexReader reader = DirectoryReader.open(dir)) {
-            assertEquals(100, reader.numDocs());
-
-            Terms terms = MultiTerms.getTerms(reader, "field");
-            assertNotNull(terms);
-            TermsEnum termsEnum = terms.iterator();
-
-            int count = 0;
-            BytesRef term;
-            while ((term = termsEnum.next()) != null) {
-                assertTrue(term.utf8ToString().startsWith("value"));
-                count++;
-            }
-            assertEquals(100, count);
-        }
+        try (Directory dir = new ByteBuffersDirectory()) {
+            IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());
+            iwc.setCodec(criteriaBasedPostingFormat);
+
+            try (IndexWriter writer = new IndexWriter(dir, iwc)) {
+                for (int i = 0; i < 100; i++) {
+                    Document doc = new Document();
+                    doc.add(new StringField("_id", "doc" + i, Field.Store.YES));
+                    doc.add(new TextField("field", "value" + i, Field.Store.YES));
+                    writer.addDocument(doc);
+                }
+            }
+
+            try (IndexReader reader = DirectoryReader.open(dir)) {
+                assertEquals(100, reader.numDocs());
+
+                Terms terms = MultiTerms.getTerms(reader, "field");
+                assertNotNull(terms);
+                TermsEnum termsEnum = terms.iterator();
+
+                int count = 0;
+                BytesRef term;
+                while ((term = termsEnum.next()) != null) {
+                    assertTrue(term.utf8ToString().startsWith("value"));
+                    count++;
+                }
+                assertEquals(100, count);
+            }
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public void testBasicFunctionality() throws IOException {
Directory dir = new ByteBuffersDirectory();
IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());
iwc.setCodec(criteriaBasedPostingFormat);
try (IndexWriter writer = new IndexWriter(dir, iwc)) {
for (int i = 0; i < 100; i++) {
Document doc = new Document();
doc.add(new StringField("_id", "doc" + i, Field.Store.YES));
doc.add(new TextField("field", "value" + i, Field.Store.YES));
writer.addDocument(doc);
}
}
try (IndexReader reader = DirectoryReader.open(dir)) {
assertEquals(100, reader.numDocs());
Terms terms = MultiTerms.getTerms(reader, "field");
assertNotNull(terms);
TermsEnum termsEnum = terms.iterator();
int count = 0;
BytesRef term;
while ((term = termsEnum.next()) != null) {
assertTrue(term.utf8ToString().startsWith("value"));
count++;
}
assertEquals(100, count);
}
public void testBasicFunctionality() throws IOException {
try (Directory dir = new ByteBuffersDirectory()) {
IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());
iwc.setCodec(criteriaBasedPostingFormat);
try (IndexWriter writer = new IndexWriter(dir, iwc)) {
for (int i = 0; i < 100; i++) {
Document doc = new Document();
doc.add(new StringField("_id", "doc" + i, Field.Store.YES));
doc.add(new TextField("field", "value" + i, Field.Store.YES));
writer.addDocument(doc);
}
}
try (IndexReader reader = DirectoryReader.open(dir)) {
assertEquals(100, reader.numDocs());
Terms terms = MultiTerms.getTerms(reader, "field");
assertNotNull(terms);
TermsEnum termsEnum = terms.iterator();
int count = 0;
BytesRef term;
while ((term = termsEnum.next()) != null) {
assertTrue(term.utf8ToString().startsWith("value"));
count++;
}
assertEquals(100, count);
}
}
}
🤖 Prompt for AI Agents
In
`@server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java`
around lines 41 - 69, The test creates a ByteBuffersDirectory (Directory dir =
new ByteBuffersDirectory()) but never closes it; wrap the Directory in a
try-with-resources that encloses the IndexWriter and IndexReader usage in
testBasicFunctionality (and apply the same change to the other three test
methods) so the Directory is closed after the reader; ensure the IndexReader is
opened/closed inside the Directory try block (or nest the reader/writer
try-with-resources inside the Directory try-with-resources) to guarantee the
reader is closed before the Directory is closed.

Comment on lines +3816 to +3853
@LockFeatureFlag(CONTEXT_AWARE_MIGRATION_EXPERIMENTAL_FLAG)
public void testCriteriaBasedGrouping() throws Exception {
final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(
"test",
Settings.builder()
.put(defaultSettings.getSettings())
.put(IndexSettings.INDEX_CONTEXT_AWARE_ENABLED_SETTING.getKey(), true)
.build()
);
try (
Store store = createStore();
Engine engine = createEngine(indexSettings, store, createTempDir(), newCriteriaBasedMergePolicy(), null, null, null, null, null)
) {
List<Segment> segments = engine.segments(true);
assertThat(segments.isEmpty(), equalTo(true));
for (int i = 1; i <= 100; i++) {
String groupingCriteria;
if (i % 3 == 0) {
groupingCriteria = "grouping_criteria1";
} else if (i % 3 == 1) {
groupingCriteria = "grouping_criteria2";
} else {
groupingCriteria = "grouping_criteria3";
}

final ParsedDocument doc = testParsedDocument("1", null, testContextSpecificDocument(groupingCriteria), B_1, null);
engine.index(indexForDoc(doc));
if (i % 5 == 0) {
engine.refresh("test");
}
}

engine.refresh("test");
segments = engine.segments(true);
Set<String> attributes = segments.stream().map(segment -> segment.getAttributes().get(BUCKET_NAME)).collect(Collectors.toSet());
assertThat(attributes, Matchers.containsInAnyOrder("grouping_criteria1", "grouping_criteria2", "grouping_criteria3"));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid reusing the same _id in the grouping test.
Line 3841 always indexes id "1", so only the last criteria remains live; earlier buckets can be dropped/merged away, making the bucket assertion flaky or incorrect. Use unique ids per iteration (or per bucket) to keep live docs in each bucket.

✅ Suggested fix
-                final ParsedDocument doc = testParsedDocument("1", null, testContextSpecificDocument(groupingCriteria), B_1, null);
+                final String docId = "doc_" + i + "_" + groupingCriteria;
+                final ParsedDocument doc = testParsedDocument(docId, null, testContextSpecificDocument(groupingCriteria), B_1, null);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@LockFeatureFlag(CONTEXT_AWARE_MIGRATION_EXPERIMENTAL_FLAG)
public void testCriteriaBasedGrouping() throws Exception {
final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(
"test",
Settings.builder()
.put(defaultSettings.getSettings())
.put(IndexSettings.INDEX_CONTEXT_AWARE_ENABLED_SETTING.getKey(), true)
.build()
);
try (
Store store = createStore();
Engine engine = createEngine(indexSettings, store, createTempDir(), newCriteriaBasedMergePolicy(), null, null, null, null, null)
) {
List<Segment> segments = engine.segments(true);
assertThat(segments.isEmpty(), equalTo(true));
for (int i = 1; i <= 100; i++) {
String groupingCriteria;
if (i % 3 == 0) {
groupingCriteria = "grouping_criteria1";
} else if (i % 3 == 1) {
groupingCriteria = "grouping_criteria2";
} else {
groupingCriteria = "grouping_criteria3";
}
final ParsedDocument doc = testParsedDocument("1", null, testContextSpecificDocument(groupingCriteria), B_1, null);
engine.index(indexForDoc(doc));
if (i % 5 == 0) {
engine.refresh("test");
}
}
engine.refresh("test");
segments = engine.segments(true);
Set<String> attributes = segments.stream().map(segment -> segment.getAttributes().get(BUCKET_NAME)).collect(Collectors.toSet());
assertThat(attributes, Matchers.containsInAnyOrder("grouping_criteria1", "grouping_criteria2", "grouping_criteria3"));
}
}
`@LockFeatureFlag`(CONTEXT_AWARE_MIGRATION_EXPERIMENTAL_FLAG)
public void testCriteriaBasedGrouping() throws Exception {
final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(
"test",
Settings.builder()
.put(defaultSettings.getSettings())
.put(IndexSettings.INDEX_CONTEXT_AWARE_ENABLED_SETTING.getKey(), true)
.build()
);
try (
Store store = createStore();
Engine engine = createEngine(indexSettings, store, createTempDir(), newCriteriaBasedMergePolicy(), null, null, null, null, null)
) {
List<Segment> segments = engine.segments(true);
assertThat(segments.isEmpty(), equalTo(true));
for (int i = 1; i <= 100; i++) {
String groupingCriteria;
if (i % 3 == 0) {
groupingCriteria = "grouping_criteria1";
} else if (i % 3 == 1) {
groupingCriteria = "grouping_criteria2";
} else {
groupingCriteria = "grouping_criteria3";
}
final String docId = "doc_" + i + "_" + groupingCriteria;
final ParsedDocument doc = testParsedDocument(docId, null, testContextSpecificDocument(groupingCriteria), B_1, null);
engine.index(indexForDoc(doc));
if (i % 5 == 0) {
engine.refresh("test");
}
}
engine.refresh("test");
segments = engine.segments(true);
Set<String> attributes = segments.stream().map(segment -> segment.getAttributes().get(BUCKET_NAME)).collect(Collectors.toSet());
assertThat(attributes, Matchers.containsInAnyOrder("grouping_criteria1", "grouping_criteria2", "grouping_criteria3"));
}
}
🤖 Prompt for AI Agents
In `@server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java`
around lines 3816 - 3853, In testCriteriaBasedGrouping the test reuses the same
document id ("1") for every iteration which causes older grouping buckets to be
removed/merged and makes the final bucket assertion flaky; update the loop in
testCriteriaBasedGrouping so each ParsedDocument created by testParsedDocument
has a unique _id (for example append the loop index i or the groupingCriteria to
the id) when calling engine.index(indexForDoc(doc)) so that live docs exist in
each bucket and the Set<String> attributes assertion reliably contains all three
grouping criteria.

@github-actions
Copy link
Contributor

❌ Gradle check result for 7e949cb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant