Fix CriteriaBasedCodec to work with delegate codec #20442

RS146BIJAY · 2026-01-20T02:45:40Z

Description

Fix CriteriaBasedCodec to work with delegate codec

Related Issues

Fix CriteriaBasedCodec to work with delegate codec

coderabbitai · 2026-01-20T02:46:04Z

📝 Walkthrough

Walkthrough

This PR refactors the codec architecture by introducing CriteriaBasedPostingsFormat as a new SPI provider to handle postings format with bucket-level attribute tracking. CriteriaBasedCodec is simplified to delegate postings format handling while unregistered from the Codec SPI. Tests validate the new grouping functionality across multiple scenarios.

Changes

Cohort / File(s)	Summary
Codec Architecture Refactor `server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java`, `server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java`	CriteriaBasedCodec simplified to add public `postingsFormat()` override and `ATTRIBUTE_BINDING_TARGET_FIELD` constant; removed segmentInfoFormat and docValuesFormat customizations. New CriteriaBasedPostingsFormat class implements PostingsFormat with bucket-level attribute tracking, delegate codec wrapping, and specialized FieldsConsumer for merge-aware bucket propagation.
Service Provider Configuration `server/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec`, `server/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat`	Removed CriteriaBasedCodec from Codec SPI registry; added CriteriaBasedPostingsFormat to PostingsFormat SPI registry.
Test Coverage `server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java`, `server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java`, `test/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java`	New CriteriaBasedPostingsFormatTests class validates indexing, bucket attribute assignment, and null bucket placeholder behavior. InternalEngineTests adds testCriteriaBasedGrouping() test and helper method. EngineTestCase.testContextSpecificDocument() signature updated to accept groupingCriteria parameter.
Documentation `CHANGELOG.md`	Added changelog entry documenting CriteriaBasedCodec delegate codec compatibility fix.

Sequence Diagram(s)

sequenceDiagram
    participant Writer as IndexWriter
    participant CriteriaBasedPostingsFormat as CriteriaBasedPostingsFormat
    participant FieldsConsumer as CriteriaBasedFieldsConsumer
    participant DelegateFormat as Delegate PostingsFormat
    participant SegmentState as SegmentWriteState

    Writer->>CriteriaBasedPostingsFormat: fieldsConsumer(state)
    CriteriaBasedPostingsFormat->>DelegateFormat: fieldsConsumer(state)
    DelegateFormat-->>FieldsConsumer: FieldsConsumer instance
    CriteriaBasedPostingsFormat->>FieldsConsumer: wrap delegate
    FieldsConsumer->>FieldsConsumer: setAttributes on segment<br/>(bucket, delegate codec)
    FieldsConsumer->>FieldsConsumer: set bucket on _id field<br/>if bucket present
    FieldsConsumer->>DelegateFormat: write(fields)
    DelegateFormat-->>FieldsConsumer: success
    FieldsConsumer->>SegmentState: recordAttribute(DELEGATE_CODEC_KEY)

sequenceDiagram
    participant Reader as IndexReader
    participant CriteriaBasedPostingsFormat as CriteriaBasedPostingsFormat
    participant SegmentState as SegmentReadState
    participant CodecRegistry as Codec Registry
    participant DelegateFormat as Delegate PostingsFormat

    Reader->>CriteriaBasedPostingsFormat: fieldsProducer(state)
    CriteriaBasedPostingsFormat->>SegmentState: readAttribute(DELEGATE_CODEC_KEY)
    SegmentState-->>CriteriaBasedPostingsFormat: codec name
    CriteriaBasedPostingsFormat->>CodecRegistry: lookup(codec name)
    CodecRegistry-->>CriteriaBasedPostingsFormat: PostingsFormat
    CriteriaBasedPostingsFormat->>DelegateFormat: fieldsProducer(state)
    DelegateFormat-->>CriteriaBasedPostingsFormat: FieldsProducer instance
    CriteriaBasedPostingsFormat-->>Reader: FieldsProducer

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

lucene

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description lacks substantive detail about what the fix achieves, missing explanation of architectural changes, and the Related Issues section duplicates the title without referencing a specific GitHub issue.	Expand the description to explain the delegate codec pattern, why it's needed, and what problem it solves. Update Related Issues to reference the actual GitHub issue number rather than repeating the title.
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly describes the main fix implemented across the changeset: restructuring CriteriaBasedCodec to work with a delegate codec pattern.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.40.5)

server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: RS146BIJAY <rishavsagar4b1@gmail.com>

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In
`@server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java`:
- Around line 118-130: In merge(MergeState mergeState, NormsProducer norms) the
code assumes mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD)
is non-null which can cause an NPE; instead iterate over mergeState.fieldInfos
to find the first FieldInfos instance whose
fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) is non-null, read its
getAttribute(BUCKET_NAME) with a null-check, and only then call
mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute) and
mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME,
attribute); if no non-null fieldInfo is found, skip setting the attribute.
Ensure you still call delegateFieldsConsumer.merge(mergeState, norms) as before.

In
`@server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java`:
- Around line 41-69: The test creates a ByteBuffersDirectory (Directory dir =
new ByteBuffersDirectory()) but never closes it; wrap the Directory in a
try-with-resources that encloses the IndexWriter and IndexReader usage in
testBasicFunctionality (and apply the same change to the other three test
methods) so the Directory is closed after the reader; ensure the IndexReader is
opened/closed inside the Directory try block (or nest the reader/writer
try-with-resources inside the Directory try-with-resources) to guarantee the
reader is closed before the Directory is closed.

In `@server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java`:
- Around line 3816-3853: In testCriteriaBasedGrouping the test reuses the same
document id ("1") for every iteration which causes older grouping buckets to be
removed/merged and makes the final bucket assertion flaky; update the loop in
testCriteriaBasedGrouping so each ParsedDocument created by testParsedDocument
has a unique _id (for example append the loop index i or the groupingCriteria to
the id) when calling engine.index(indexForDoc(doc)) so that live docs exist in
each bucket and the Set<String> attributes assertion reliably contains all three
grouping criteria.

🧹 Nitpick comments (1)

server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)
77-80: Fail fast when DELEGATE_CODEC_KEY is missing (asserts are off in prod).

Line 79 relies on assert, which won’t run in production; a missing attribute will instead fail later with a less actionable error. Consider an explicit null check with a clear exception.
♻️ Suggested refactor
-        assert state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY) != null;
-        return PostingsFormat.forName(state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY)).fieldsProducer(state);
+        String delegateName = state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY);
+        if (delegateName == null) {
+            throw new IllegalStateException(
+                "Missing " + DELEGATE_CODEC_KEY + " in segmentInfo for " + state.segmentInfo.name
+            );
+        }
+        return PostingsFormat.forName(delegateName).fieldsProducer(state);

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d145d6b and 7e949cb.

📒 Files selected for processing (8)

CHANGELOG.md
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java
server/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec
server/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat
server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java
server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java
test/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java

💤 Files with no reviewable changes (1)

server/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-12-02T22:44:14.799Z

Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.

Applied to files:

server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java
server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java
server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java

📚 Learning: 2026-01-13T17:40:27.167Z

Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.

Applied to files:

server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java

🧬 Code graph analysis (3)

server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java (1)

server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)

CriteriaBasedPostingsFormat (32-145)

server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java (2)

server/src/main/java/org/opensearch/index/CriteriaBasedMergePolicy.java (1)

CriteriaBasedMergePolicy (30-82)

server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)

CriteriaBasedCodec (20-49)

server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)

server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)

CriteriaBasedCodec (20-49)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: gradle-check
GitHub Check: Mend Security Check

🔇 Additional comments (9)

CHANGELOG.md (1)

35-35: Changelog entry looks good.

Captures the fix in the correct section.

server/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat (1)

3-3: Provider registration is correct.

This enables SPI discovery for the new postings format.

test/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java (1)

369-372: Good update to use the provided grouping criteria.

Keeps the helper aligned with the new tests.

server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java (5)

130-130: No review comment.

214-214: No review comment.

3855-3857: LGTM.
Helper keeps the merge policy setup tidy.

8741-8743: LGTM.
Context-specific documents align with context-aware settings in this test.

8786-8788: LGTM.
Context-specific documents align with context-aware settings in this test.

server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)

31-47: Verify behavior when the delegate postings format is not per-field.

Line 34 only wraps CriteriaBasedPostingsFormat when the delegate is a PerFieldPostingsFormat. If a plugin supplies a non‑per‑field postings format, bucket attributes won’t be written and criteria grouping is effectively disabled. Please confirm this invariant or consider wrapping the non‑per‑field format as well.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-20T02:57:15Z

server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java

+        @Override
+        public void merge(MergeState mergeState, NormsProducer norms) throws IOException {
+            delegateFieldsConsumer.merge(mergeState, norms);
+            Set<String> mergeFieldNames = StreamSupport.stream(mergeState.mergeFieldInfos.spliterator(), false)
+                .map(FieldInfo::getName)
+                .collect(Collectors.toSet());
+            if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) {
+                String attribute = mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).getAttribute(BUCKET_NAME);
+                assert attribute != null : "Attribute should not be null during merging segment";
+                mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute);
+
+                mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute);
+            }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "CriteriaBasedPostingsFormat.java" -type f

Repository: opensearch-project/OpenSearch

Length of output: 155

🏁 Script executed:

wc -l ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java

Repository: opensearch-project/OpenSearch

Length of output: 159

🏁 Script executed:

cat -n ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java

Repository: opensearch-project/OpenSearch

Length of output: 7396

🏁 Script executed:

# Search for other uses of mergeState.fieldInfos to understand patterns rg "mergeState\.fieldInfos" --type java -B 2 -A 2

Repository: opensearch-project/OpenSearch

Length of output: 1061

🏁 Script executed:

# Search for how fieldInfo() method is used in the codebase rg "\.fieldInfo\(" --type java -B 1 -A 1 | head -50

Repository: opensearch-project/OpenSearch

Length of output: 6032

🏁 Script executed:

# Search for tests related to CriteriaBasedPostingsFormat to understand expected behavior find . -name "*CriteriaBasedPostingsFormat*Test*" -type f

Repository: opensearch-project/OpenSearch

Length of output: 160

🏁 Script executed:

cat -n ./server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java

Repository: opensearch-project/OpenSearch

Length of output: 6445

🏁 Script executed:

# Search for documentation or code about parent-writer or tombstone segments rg -i "parent.writer|tombstone|parent writer" --type java | head -30

Repository: opensearch-project/OpenSearch

Length of output: 4410

🏁 Script executed:

# Search for ATTRIBUTE_BINDING_TARGET_FIELD definition and usage rg "ATTRIBUTE_BINDING_TARGET_FIELD" --type java -B 1 -A 1

Repository: opensearch-project/OpenSearch

Length of output: 2795

🏁 Script executed:

# Look at CriteriaBasedCodec to understand context cat -n ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java

Repository: opensearch-project/OpenSearch

Length of output: 2091

🏁 Script executed:

# Search for how mergeState is typically used in other codecs find . -name "*.java" -path "*/codec/*" -type f | head -10

Repository: opensearch-project/OpenSearch

Length of output: 880

🏁 Script executed:

cat -n ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedDocValuesWriter.java | head -100

Repository: opensearch-project/OpenSearch

Length of output: 3146

🏁 Script executed:

sed -n '94,116p' ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java

Repository: opensearch-project/OpenSearch

Length of output: 1514

🏁 Script executed:

# Check if there are any merge tests or scenarios that might show this issue rg "merge" ./server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java

Repository: opensearch-project/OpenSearch

Length of output: 55

🏁 Script executed:

# Search for how other FieldsConsumer implementations handle merge rg "class.*FieldsConsumer" --type java -A 20 | grep -A 15 "merge" | head -40

Repository: opensearch-project/OpenSearch

Length of output: 55

🏁 Script executed:

# Let's verify the pattern: check if fieldInfo() can return null by looking at Lucene API expectations # Search for null checks after fieldInfo() calls rg "fieldInfo$[^)]+$\s*==" --type java | head -20

Repository: opensearch-project/OpenSearch

Length of output: 243

NPE when the first merged segment lacks _id field.

Line 125 reads the bucket attribute from mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) without null-checking the result. If the first segment is a parent-writer or tombstone segment without the _id field (as the comments in the write() method acknowledge is possible), fieldInfo() returns null, and calling .getAttribute() causes an NPE. This is not protected by the assertion on line 126 since assertions are disabled in production.

Scan the merge inputs for the first segment that actually contains the _id field rather than assuming the first segment has it.

🐛 Proposed fix

- if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) { - String attribute = mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).getAttribute(BUCKET_NAME); - assert attribute != null : "Attribute should not be null during merging segment"; + if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) { + String attribute = null; + for (int i = 0; i < mergeState.fieldInfos.length; i++) { + FieldInfo idInfo = mergeState.fieldInfos[i].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD); + if (idInfo != null) { + attribute = idInfo.getAttribute(BUCKET_NAME); + if (attribute != null) { + break; + } + } + } + if (attribute == null) { + throw new IllegalStateException( + "Bucket attribute missing during merge for " + ATTRIBUTE_BINDING_TARGET_FIELD + ); + } mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute); mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@Override

public void merge(MergeState mergeState, NormsProducer norms) throws IOException {

delegateFieldsConsumer.merge(mergeState, norms);

Set<String> mergeFieldNames = StreamSupport.stream(mergeState.mergeFieldInfos.spliterator(), false)

.map(FieldInfo::getName)

.collect(Collectors.toSet());

if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) {

String attribute = mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).getAttribute(BUCKET_NAME);

assert attribute != null : "Attribute should not be null during merging segment";

mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute);

mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute);

}

`@Override`

public void merge(MergeState mergeState, NormsProducer norms) throws IOException {

delegateFieldsConsumer.merge(mergeState, norms);

Set<String> mergeFieldNames = StreamSupport.stream(mergeState.mergeFieldInfos.spliterator(), false)

.map(FieldInfo::getName)

.collect(Collectors.toSet());

if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) {

String attribute = null;

for (int i = 0; i < mergeState.fieldInfos.length; i++) {

FieldInfo idInfo = mergeState.fieldInfos[i].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD);

if (idInfo != null) {

attribute = idInfo.getAttribute(BUCKET_NAME);

if (attribute != null) {

break;

}

}

}

if (attribute == null) {

throw new IllegalStateException(

"Bucket attribute missing during merge for " + ATTRIBUTE_BINDING_TARGET_FIELD

);

}

mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute);

mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute);

}

🤖 Prompt for AI Agents

In `@server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java` around lines 118 - 130, In merge(MergeState mergeState, NormsProducer norms) the code assumes mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) is non-null which can cause an NPE; instead iterate over mergeState.fieldInfos to find the first FieldInfos instance whose fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) is non-null, read its getAttribute(BUCKET_NAME) with a null-check, and only then call mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute) and mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute); if no non-null fieldInfo is found, skip setting the attribute. Ensure you still call delegateFieldsConsumer.merge(mergeState, norms) as before.

coderabbitai · 2026-01-20T02:57:15Z

server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java

+    public void testBasicFunctionality() throws IOException {
+        Directory dir = new ByteBuffersDirectory();
+        IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());
+        iwc.setCodec(criteriaBasedPostingFormat);
+
+        try (IndexWriter writer = new IndexWriter(dir, iwc)) {
+            for (int i = 0; i < 100; i++) {
+                Document doc = new Document();
+                doc.add(new StringField("_id", "doc" + i, Field.Store.YES));
+                doc.add(new TextField("field", "value" + i, Field.Store.YES));
+                writer.addDocument(doc);
+            }
+        }
+
+        try (IndexReader reader = DirectoryReader.open(dir)) {
+            assertEquals(100, reader.numDocs());
+
+            Terms terms = MultiTerms.getTerms(reader, "field");
+            assertNotNull(terms);
+            TermsEnum termsEnum = terms.iterator();
+
+            int count = 0;
+            BytesRef term;
+            while ((term = termsEnum.next()) != null) {
+                assertTrue(term.utf8ToString().startsWith("value"));
+                count++;
+            }
+            assertEquals(100, count);
+        }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "CriteriaBasedPostingsFormatTests.java" | head -5

Repository: opensearch-project/OpenSearch

Length of output: 160

🏁 Script executed:

# List test files in the codec directory fd -t f ".*Tests\.java$" server/src/test/java/org/opensearch/index/codec/

Repository: opensearch-project/OpenSearch

Length of output: 1725

🏁 Script executed:

# Get file size first wc -l server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java

Repository: opensearch-project/OpenSearch

Length of output: 162

🏁 Script executed:

cat -n server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java

Repository: opensearch-project/OpenSearch

Length of output: 6445

🌐 Web query:

Lucene ByteBuffersDirectory AutoCloseable resource management

💡 Result:

Short answer: ByteBuffersDirectory implements Closeable/AutoCloseable and must be closed to free its heap ByteBuffers — use try-with-resources or call close() when you’re done. [1][2]

Key points

ByteBuffersDirectory implements Closeable/AutoCloseable and exposes close() which throws IOException, so it participates in Java's try-with-resources. [1]

Directory (and BaseDirectory) semantics expect callers to close the Directory when finished; do not close it while an IndexWriter/IndexReader/IndexSearcher is still using it (closing will make the Directory unusable). [2]

Minimal example (conceptual)

try (Directory dir = new ByteBuffersDirectory()) { /* use IndexWriter/Readers here */ } // dir.close() called automatically.

Sources
[1] ByteBuffersDirectory Javadoc (implements Closeable/AutoCloseable, has close()).
[2] Directory / BaseDirectory Javadoc (Directory is Closeable; close() semantics).

🌐 Web query:

Lucene DirectoryReader close directory resource leak

💡 Result:

Yes — if you open a DirectoryReader and do not close it (or properly decRef it), you will leak file handles and other resources. Fixes:

Always close the reader when you’re done (try-with-resources or finally). DirectoryReader implements Closeable. [1]

In long-running apps use SearcherManager (or your own ref-counting) to safely reuse/reopen readers without closing readers still in use. DirectoryReader.openIfChanged returns a new reader but does NOT close the old one — you must close it when safe. [1][2]

If you manually share readers, use incRef()/decRef() (or ensure only one owner closes) to avoid closing while others still use it. [1]

Minimal example (concept):
try (DirectoryReader r = DirectoryReader.open(dir)) {
IndexSearcher s = new IndexSearcher(r);
// use searcher
} // reader closed automatically

References:
[1] Lucene DirectoryReader javadoc (notes on close, openIfChanged, SearcherManager).
[2] StackOverflow discussion: need to close DirectoryReader / openIfChanged behavior.

Close Directory instances to avoid resource leaks in tests.

All four test methods create ByteBuffersDirectory instances without closing them. ByteBuffersDirectory implements Closeable and must be closed to free heap resources. Wrap the directory in try-with-resources, ensuring the IndexReader closes before the directory:

♻️ Suggested fix (apply to all four test methods)

- Directory dir = new ByteBuffersDirectory(); - IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer()); - iwc.setCodec(criteriaBasedPostingFormat); - - try (IndexWriter writer = new IndexWriter(dir, iwc)) { - for (int i = 0; i < 100; i++) { - Document doc = new Document(); - doc.add(new StringField("_id", "doc" + i, Field.Store.YES)); - doc.add(new TextField("field", "value" + i, Field.Store.YES)); - writer.addDocument(doc); - } - } - - try (IndexReader reader = DirectoryReader.open(dir)) { - assertEquals(100, reader.numDocs()); - - Terms terms = MultiTerms.getTerms(reader, "field"); - assertNotNull(terms); - TermsEnum termsEnum = terms.iterator(); - - int count = 0; - BytesRef term; - while ((term = termsEnum.next()) != null) { - assertTrue(term.utf8ToString().startsWith("value")); - count++; - } - assertEquals(100, count); - } + try (Directory dir = new ByteBuffersDirectory()) { + IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer()); + iwc.setCodec(criteriaBasedPostingFormat); + + try (IndexWriter writer = new IndexWriter(dir, iwc)) { + for (int i = 0; i < 100; i++) { + Document doc = new Document(); + doc.add(new StringField("_id", "doc" + i, Field.Store.YES)); + doc.add(new TextField("field", "value" + i, Field.Store.YES)); + writer.addDocument(doc); + } + } + + try (IndexReader reader = DirectoryReader.open(dir)) { + assertEquals(100, reader.numDocs()); + + Terms terms = MultiTerms.getTerms(reader, "field"); + assertNotNull(terms); + TermsEnum termsEnum = terms.iterator(); + + int count = 0; + BytesRef term; + while ((term = termsEnum.next()) != null) { + assertTrue(term.utf8ToString().startsWith("value")); + count++; + } + assertEquals(100, count); + } + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

public void testBasicFunctionality() throws IOException {

Directory dir = new ByteBuffersDirectory();

IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());

iwc.setCodec(criteriaBasedPostingFormat);

try (IndexWriter writer = new IndexWriter(dir, iwc)) {

for (int i = 0; i < 100; i++) {

Document doc = new Document();

doc.add(new StringField("_id", "doc" + i, Field.Store.YES));

doc.add(new TextField("field", "value" + i, Field.Store.YES));

writer.addDocument(doc);

}

}

try (IndexReader reader = DirectoryReader.open(dir)) {

assertEquals(100, reader.numDocs());

Terms terms = MultiTerms.getTerms(reader, "field");

assertNotNull(terms);

TermsEnum termsEnum = terms.iterator();

int count = 0;

BytesRef term;

while ((term = termsEnum.next()) != null) {

assertTrue(term.utf8ToString().startsWith("value"));

count++;

}

assertEquals(100, count);

}

public void testBasicFunctionality() throws IOException {

try (Directory dir = new ByteBuffersDirectory()) {

IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());

iwc.setCodec(criteriaBasedPostingFormat);

try (IndexWriter writer = new IndexWriter(dir, iwc)) {

for (int i = 0; i < 100; i++) {

Document doc = new Document();

doc.add(new StringField("_id", "doc" + i, Field.Store.YES));

doc.add(new TextField("field", "value" + i, Field.Store.YES));

writer.addDocument(doc);

}

}

try (IndexReader reader = DirectoryReader.open(dir)) {

assertEquals(100, reader.numDocs());

Terms terms = MultiTerms.getTerms(reader, "field");

assertNotNull(terms);

TermsEnum termsEnum = terms.iterator();

int count = 0;

BytesRef term;

while ((term = termsEnum.next()) != null) {

assertTrue(term.utf8ToString().startsWith("value"));

count++;

}

assertEquals(100, count);

}

}

}

🤖 Prompt for AI Agents

In `@server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java` around lines 41 - 69, The test creates a ByteBuffersDirectory (Directory dir = new ByteBuffersDirectory()) but never closes it; wrap the Directory in a try-with-resources that encloses the IndexWriter and IndexReader usage in testBasicFunctionality (and apply the same change to the other three test methods) so the Directory is closed after the reader; ensure the IndexReader is opened/closed inside the Directory try block (or nest the reader/writer try-with-resources inside the Directory try-with-resources) to guarantee the reader is closed before the Directory is closed.

coderabbitai · 2026-01-20T02:57:15Z

server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java

+    @LockFeatureFlag(CONTEXT_AWARE_MIGRATION_EXPERIMENTAL_FLAG)
+    public void testCriteriaBasedGrouping() throws Exception {
+        final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(
+            "test",
+            Settings.builder()
+                .put(defaultSettings.getSettings())
+                .put(IndexSettings.INDEX_CONTEXT_AWARE_ENABLED_SETTING.getKey(), true)
+                .build()
+        );
+        try (
+            Store store = createStore();
+            Engine engine = createEngine(indexSettings, store, createTempDir(), newCriteriaBasedMergePolicy(), null, null, null, null, null)
+        ) {
+            List<Segment> segments = engine.segments(true);
+            assertThat(segments.isEmpty(), equalTo(true));
+            for (int i = 1; i <= 100; i++) {
+                String groupingCriteria;
+                if (i % 3 == 0) {
+                    groupingCriteria = "grouping_criteria1";
+                } else if (i % 3 == 1) {
+                    groupingCriteria = "grouping_criteria2";
+                } else {
+                    groupingCriteria = "grouping_criteria3";
+                }
+
+                final ParsedDocument doc = testParsedDocument("1", null, testContextSpecificDocument(groupingCriteria), B_1, null);
+                engine.index(indexForDoc(doc));
+                if (i % 5 == 0) {
+                    engine.refresh("test");
+                }
+            }
+
+            engine.refresh("test");
+            segments = engine.segments(true);
+            Set<String> attributes = segments.stream().map(segment -> segment.getAttributes().get(BUCKET_NAME)).collect(Collectors.toSet());
+            assertThat(attributes, Matchers.containsInAnyOrder("grouping_criteria1", "grouping_criteria2", "grouping_criteria3"));
+        }
+    }


⚠️ Potential issue | 🟠 Major

Avoid reusing the same _id in the grouping test.
Line 3841 always indexes id "1", so only the last criteria remains live; earlier buckets can be dropped/merged away, making the bucket assertion flaky or incorrect. Use unique ids per iteration (or per bucket) to keep live docs in each bucket.

✅ Suggested fix

- final ParsedDocument doc = testParsedDocument("1", null, testContextSpecificDocument(groupingCriteria), B_1, null); + final String docId = "doc_" + i + "_" + groupingCriteria; + final ParsedDocument doc = testParsedDocument(docId, null, testContextSpecificDocument(groupingCriteria), B_1, null);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@LockFeatureFlag(CONTEXT_AWARE_MIGRATION_EXPERIMENTAL_FLAG)

public void testCriteriaBasedGrouping() throws Exception {

final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(

"test",

Settings.builder()

.put(defaultSettings.getSettings())

.put(IndexSettings.INDEX_CONTEXT_AWARE_ENABLED_SETTING.getKey(), true)

.build()

);

try (

Store store = createStore();

Engine engine = createEngine(indexSettings, store, createTempDir(), newCriteriaBasedMergePolicy(), null, null, null, null, null)

) {

List<Segment> segments = engine.segments(true);

assertThat(segments.isEmpty(), equalTo(true));

for (int i = 1; i <= 100; i++) {

String groupingCriteria;

if (i % 3 == 0) {

groupingCriteria = "grouping_criteria1";

} else if (i % 3 == 1) {

groupingCriteria = "grouping_criteria2";

} else {

groupingCriteria = "grouping_criteria3";

}

final ParsedDocument doc = testParsedDocument("1", null, testContextSpecificDocument(groupingCriteria), B_1, null);

engine.index(indexForDoc(doc));

if (i % 5 == 0) {

engine.refresh("test");

}

}

engine.refresh("test");

segments = engine.segments(true);

Set<String> attributes = segments.stream().map(segment -> segment.getAttributes().get(BUCKET_NAME)).collect(Collectors.toSet());

assertThat(attributes, Matchers.containsInAnyOrder("grouping_criteria1", "grouping_criteria2", "grouping_criteria3"));

}

}

`@LockFeatureFlag`(CONTEXT_AWARE_MIGRATION_EXPERIMENTAL_FLAG)

public void testCriteriaBasedGrouping() throws Exception {

final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings(

"test",

Settings.builder()

.put(defaultSettings.getSettings())

.put(IndexSettings.INDEX_CONTEXT_AWARE_ENABLED_SETTING.getKey(), true)

.build()

);

try (

Store store = createStore();

Engine engine = createEngine(indexSettings, store, createTempDir(), newCriteriaBasedMergePolicy(), null, null, null, null, null)

) {

List<Segment> segments = engine.segments(true);

assertThat(segments.isEmpty(), equalTo(true));

for (int i = 1; i <= 100; i++) {

String groupingCriteria;

if (i % 3 == 0) {

groupingCriteria = "grouping_criteria1";

} else if (i % 3 == 1) {

groupingCriteria = "grouping_criteria2";

} else {

groupingCriteria = "grouping_criteria3";

}

final String docId = "doc_" + i + "_" + groupingCriteria;

final ParsedDocument doc = testParsedDocument(docId, null, testContextSpecificDocument(groupingCriteria), B_1, null);

engine.index(indexForDoc(doc));

if (i % 5 == 0) {

engine.refresh("test");

}

}

engine.refresh("test");

segments = engine.segments(true);

Set<String> attributes = segments.stream().map(segment -> segment.getAttributes().get(BUCKET_NAME)).collect(Collectors.toSet());

assertThat(attributes, Matchers.containsInAnyOrder("grouping_criteria1", "grouping_criteria2", "grouping_criteria3"));

}

}

🤖 Prompt for AI Agents

In `@server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java` around lines 3816 - 3853, In testCriteriaBasedGrouping the test reuses the same document id ("1") for every iteration which causes older grouping buckets to be removed/merged and makes the final bucket assertion flaky; update the loop in testCriteriaBasedGrouping so each ParsedDocument created by testParsedDocument has a unique _id (for example append the loop index i or the groupingCriteria to the id) when calling engine.index(indexForDoc(doc)) so that live docs exist in each bucket and the Set<String> attributes assertion reliably contains all three grouping criteria.

github-actions · 2026-01-20T03:37:24Z

❌ Gradle check result for 7e949cb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

RS146BIJAY requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners January 20, 2026 02:45

Fix CriteriaBasedCodec to work with delegate codec

7e949cb

Signed-off-by: RS146BIJAY <rishavsagar4b1@gmail.com>

RS146BIJAY force-pushed the fix-codec branch from f27288e to 7e949cb Compare January 20, 2026 02:47

coderabbitai bot reviewed Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CriteriaBasedCodec to work with delegate codec #20442

Fix CriteriaBasedCodec to work with delegate codec #20442

RS146BIJAY commented Jan 20, 2026

Uh oh!

coderabbitai bot commented Jan 20, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 20, 2026

Uh oh!

coderabbitai bot Jan 20, 2026

Uh oh!

coderabbitai bot Jan 20, 2026

Uh oh!

github-actions bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix CriteriaBasedCodec to work with delegate codec #20442

Are you sure you want to change the base?

Fix CriteriaBasedCodec to work with delegate codec #20442

Conversation

RS146BIJAY commented Jan 20, 2026

Description

Related Issues

Uh oh!

coderabbitai bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Jan 20, 2026 •

edited

Loading