-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Fix CriteriaBasedCodec to work with delegate codec #20442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR refactors the codec architecture by introducing CriteriaBasedPostingsFormat as a new SPI provider to handle postings format with bucket-level attribute tracking. CriteriaBasedCodec is simplified to delegate postings format handling while unregistered from the Codec SPI. Tests validate the new grouping functionality across multiple scenarios. Changes
Sequence Diagram(s)sequenceDiagram
participant Writer as IndexWriter
participant CriteriaBasedPostingsFormat as CriteriaBasedPostingsFormat
participant FieldsConsumer as CriteriaBasedFieldsConsumer
participant DelegateFormat as Delegate PostingsFormat
participant SegmentState as SegmentWriteState
Writer->>CriteriaBasedPostingsFormat: fieldsConsumer(state)
CriteriaBasedPostingsFormat->>DelegateFormat: fieldsConsumer(state)
DelegateFormat-->>FieldsConsumer: FieldsConsumer instance
CriteriaBasedPostingsFormat->>FieldsConsumer: wrap delegate
FieldsConsumer->>FieldsConsumer: setAttributes on segment<br/>(bucket, delegate codec)
FieldsConsumer->>FieldsConsumer: set bucket on _id field<br/>if bucket present
FieldsConsumer->>DelegateFormat: write(fields)
DelegateFormat-->>FieldsConsumer: success
FieldsConsumer->>SegmentState: recordAttribute(DELEGATE_CODEC_KEY)
sequenceDiagram
participant Reader as IndexReader
participant CriteriaBasedPostingsFormat as CriteriaBasedPostingsFormat
participant SegmentState as SegmentReadState
participant CodecRegistry as Codec Registry
participant DelegateFormat as Delegate PostingsFormat
Reader->>CriteriaBasedPostingsFormat: fieldsProducer(state)
CriteriaBasedPostingsFormat->>SegmentState: readAttribute(DELEGATE_CODEC_KEY)
SegmentState-->>CriteriaBasedPostingsFormat: codec name
CriteriaBasedPostingsFormat->>CodecRegistry: lookup(codec name)
CodecRegistry-->>CriteriaBasedPostingsFormat: PostingsFormat
CriteriaBasedPostingsFormat->>DelegateFormat: fieldsProducer(state)
DelegateFormat-->>CriteriaBasedPostingsFormat: FieldsProducer instance
CriteriaBasedPostingsFormat-->>Reader: FieldsProducer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.40.5)server/src/test/java/org/opensearch/index/engine/InternalEngineTests.javaComment |
Signed-off-by: RS146BIJAY <rishavsagar4b1@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In
`@server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java`:
- Around line 118-130: In merge(MergeState mergeState, NormsProducer norms) the
code assumes mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD)
is non-null which can cause an NPE; instead iterate over mergeState.fieldInfos
to find the first FieldInfos instance whose
fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) is non-null, read its
getAttribute(BUCKET_NAME) with a null-check, and only then call
mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute) and
mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME,
attribute); if no non-null fieldInfo is found, skip setting the attribute.
Ensure you still call delegateFieldsConsumer.merge(mergeState, norms) as before.
In
`@server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java`:
- Around line 41-69: The test creates a ByteBuffersDirectory (Directory dir =
new ByteBuffersDirectory()) but never closes it; wrap the Directory in a
try-with-resources that encloses the IndexWriter and IndexReader usage in
testBasicFunctionality (and apply the same change to the other three test
methods) so the Directory is closed after the reader; ensure the IndexReader is
opened/closed inside the Directory try block (or nest the reader/writer
try-with-resources inside the Directory try-with-resources) to guarantee the
reader is closed before the Directory is closed.
In `@server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java`:
- Around line 3816-3853: In testCriteriaBasedGrouping the test reuses the same
document id ("1") for every iteration which causes older grouping buckets to be
removed/merged and makes the final bucket assertion flaky; update the loop in
testCriteriaBasedGrouping so each ParsedDocument created by testParsedDocument
has a unique _id (for example append the loop index i or the groupingCriteria to
the id) when calling engine.index(indexForDoc(doc)) so that live docs exist in
each bucket and the Set<String> attributes assertion reliably contains all three
grouping criteria.
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)
77-80: Fail fast whenDELEGATE_CODEC_KEYis missing (asserts are off in prod).Line 79 relies on
assert, which won’t run in production; a missing attribute will instead fail later with a less actionable error. Consider an explicit null check with a clear exception.♻️ Suggested refactor
- assert state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY) != null; - return PostingsFormat.forName(state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY)).fieldsProducer(state); + String delegateName = state.segmentInfo.getAttribute(DELEGATE_CODEC_KEY); + if (delegateName == null) { + throw new IllegalStateException( + "Missing " + DELEGATE_CODEC_KEY + " in segmentInfo for " + state.segmentInfo.name + ); + } + return PostingsFormat.forName(delegateName).fieldsProducer(state);
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CHANGELOG.mdserver/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.javaserver/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.javaserver/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codecserver/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormatserver/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.javaserver/src/test/java/org/opensearch/index/engine/InternalEngineTests.javatest/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java
💤 Files with no reviewable changes (1)
- server/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-02T22:44:14.799Z
Learnt from: prudhvigodithi
Repo: opensearch-project/OpenSearch PR: 20112
File: server/src/internalClusterTest/java/org/opensearch/search/slice/SearchSliceIT.java:73-81
Timestamp: 2025-12-02T22:44:14.799Z
Learning: In OpenSearch integration tests extending OpenSearchIntegTestCase, using `LuceneTestCase.SuppressCodecs("*")` triggers special handling that selects a random production codec from the CODECS array, while `SuppressCodecs("Asserting")` or other specific codec suppressions still allow Lucene's default codec randomization which may include the asserting codec. Use `SuppressCodecs("*")` when you need to completely avoid asserting codecs (e.g., for cross-thread StoredFieldsReader usage) while maintaining production codec test coverage.
Applied to files:
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.javaserver/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.javaserver/src/test/java/org/opensearch/index/engine/InternalEngineTests.java
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.
Applied to files:
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.javaserver/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java
🧬 Code graph analysis (3)
server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)
CriteriaBasedPostingsFormat(32-145)
server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java (2)
server/src/main/java/org/opensearch/index/CriteriaBasedMergePolicy.java (1)
CriteriaBasedMergePolicy(30-82)server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)
CriteriaBasedCodec(20-49)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java (1)
server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)
CriteriaBasedCodec(20-49)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: gradle-check
- GitHub Check: Mend Security Check
🔇 Additional comments (9)
CHANGELOG.md (1)
35-35: Changelog entry looks good.Captures the fix in the correct section.
server/src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat (1)
3-3: Provider registration is correct.This enables SPI discovery for the new postings format.
test/framework/src/main/java/org/opensearch/index/engine/EngineTestCase.java (1)
369-372: Good update to use the provided grouping criteria.Keeps the helper aligned with the new tests.
server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java (5)
130-130: No review comment.
214-214: No review comment.
3855-3857: LGTM.
Helper keeps the merge policy setup tidy.
8741-8743: LGTM.
Context-specific documents align with context-aware settings in this test.
8786-8788: LGTM.
Context-specific documents align with context-aware settings in this test.server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.java (1)
31-47: Verify behavior when the delegate postings format is not per-field.Line 34 only wraps
CriteriaBasedPostingsFormatwhen the delegate is aPerFieldPostingsFormat. If a plugin supplies a non‑per‑field postings format, bucket attributes won’t be written and criteria grouping is effectively disabled. Please confirm this invariant or consider wrapping the non‑per‑field format as well.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| @Override | ||
| public void merge(MergeState mergeState, NormsProducer norms) throws IOException { | ||
| delegateFieldsConsumer.merge(mergeState, norms); | ||
| Set<String> mergeFieldNames = StreamSupport.stream(mergeState.mergeFieldInfos.spliterator(), false) | ||
| .map(FieldInfo::getName) | ||
| .collect(Collectors.toSet()); | ||
| if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) { | ||
| String attribute = mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).getAttribute(BUCKET_NAME); | ||
| assert attribute != null : "Attribute should not be null during merging segment"; | ||
| mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute); | ||
|
|
||
| mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "CriteriaBasedPostingsFormat.java" -type fRepository: opensearch-project/OpenSearch
Length of output: 155
🏁 Script executed:
wc -l ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.javaRepository: opensearch-project/OpenSearch
Length of output: 159
🏁 Script executed:
cat -n ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.javaRepository: opensearch-project/OpenSearch
Length of output: 7396
🏁 Script executed:
# Search for other uses of mergeState.fieldInfos to understand patterns
rg "mergeState\.fieldInfos" --type java -B 2 -A 2Repository: opensearch-project/OpenSearch
Length of output: 1061
🏁 Script executed:
# Search for how fieldInfo() method is used in the codebase
rg "\.fieldInfo\(" --type java -B 1 -A 1 | head -50Repository: opensearch-project/OpenSearch
Length of output: 6032
🏁 Script executed:
# Search for tests related to CriteriaBasedPostingsFormat to understand expected behavior
find . -name "*CriteriaBasedPostingsFormat*Test*" -type fRepository: opensearch-project/OpenSearch
Length of output: 160
🏁 Script executed:
cat -n ./server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.javaRepository: opensearch-project/OpenSearch
Length of output: 6445
🏁 Script executed:
# Search for documentation or code about parent-writer or tombstone segments
rg -i "parent.writer|tombstone|parent writer" --type java | head -30Repository: opensearch-project/OpenSearch
Length of output: 4410
🏁 Script executed:
# Search for ATTRIBUTE_BINDING_TARGET_FIELD definition and usage
rg "ATTRIBUTE_BINDING_TARGET_FIELD" --type java -B 1 -A 1Repository: opensearch-project/OpenSearch
Length of output: 2795
🏁 Script executed:
# Look at CriteriaBasedCodec to understand context
cat -n ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedCodec.javaRepository: opensearch-project/OpenSearch
Length of output: 2091
🏁 Script executed:
# Search for how mergeState is typically used in other codecs
find . -name "*.java" -path "*/codec/*" -type f | head -10Repository: opensearch-project/OpenSearch
Length of output: 880
🏁 Script executed:
cat -n ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedDocValuesWriter.java | head -100Repository: opensearch-project/OpenSearch
Length of output: 3146
🏁 Script executed:
sed -n '94,116p' ./server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.javaRepository: opensearch-project/OpenSearch
Length of output: 1514
🏁 Script executed:
# Check if there are any merge tests or scenarios that might show this issue
rg "merge" ./server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.javaRepository: opensearch-project/OpenSearch
Length of output: 55
🏁 Script executed:
# Search for how other FieldsConsumer implementations handle merge
rg "class.*FieldsConsumer" --type java -A 20 | grep -A 15 "merge" | head -40Repository: opensearch-project/OpenSearch
Length of output: 55
🏁 Script executed:
# Let's verify the pattern: check if fieldInfo() can return null by looking at Lucene API expectations
# Search for null checks after fieldInfo() calls
rg "fieldInfo\([^)]+\)\s*==" --type java | head -20Repository: opensearch-project/OpenSearch
Length of output: 243
NPE when the first merged segment lacks _id field.
Line 125 reads the bucket attribute from mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) without null-checking the result. If the first segment is a parent-writer or tombstone segment without the _id field (as the comments in the write() method acknowledge is possible), fieldInfo() returns null, and calling .getAttribute() causes an NPE. This is not protected by the assertion on line 126 since assertions are disabled in production.
Scan the merge inputs for the first segment that actually contains the _id field rather than assuming the first segment has it.
🐛 Proposed fix
- if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) {
- String attribute = mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).getAttribute(BUCKET_NAME);
- assert attribute != null : "Attribute should not be null during merging segment";
+ if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) {
+ String attribute = null;
+ for (int i = 0; i < mergeState.fieldInfos.length; i++) {
+ FieldInfo idInfo = mergeState.fieldInfos[i].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD);
+ if (idInfo != null) {
+ attribute = idInfo.getAttribute(BUCKET_NAME);
+ if (attribute != null) {
+ break;
+ }
+ }
+ }
+ if (attribute == null) {
+ throw new IllegalStateException(
+ "Bucket attribute missing during merge for " + ATTRIBUTE_BINDING_TARGET_FIELD
+ );
+ }
mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute);
mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| public void merge(MergeState mergeState, NormsProducer norms) throws IOException { | |
| delegateFieldsConsumer.merge(mergeState, norms); | |
| Set<String> mergeFieldNames = StreamSupport.stream(mergeState.mergeFieldInfos.spliterator(), false) | |
| .map(FieldInfo::getName) | |
| .collect(Collectors.toSet()); | |
| if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) { | |
| String attribute = mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).getAttribute(BUCKET_NAME); | |
| assert attribute != null : "Attribute should not be null during merging segment"; | |
| mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute); | |
| mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute); | |
| } | |
| `@Override` | |
| public void merge(MergeState mergeState, NormsProducer norms) throws IOException { | |
| delegateFieldsConsumer.merge(mergeState, norms); | |
| Set<String> mergeFieldNames = StreamSupport.stream(mergeState.mergeFieldInfos.spliterator(), false) | |
| .map(FieldInfo::getName) | |
| .collect(Collectors.toSet()); | |
| if (mergeFieldNames.contains(ATTRIBUTE_BINDING_TARGET_FIELD) && mergeState.fieldInfos.length > 0) { | |
| String attribute = null; | |
| for (int i = 0; i < mergeState.fieldInfos.length; i++) { | |
| FieldInfo idInfo = mergeState.fieldInfos[i].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD); | |
| if (idInfo != null) { | |
| attribute = idInfo.getAttribute(BUCKET_NAME); | |
| if (attribute != null) { | |
| break; | |
| } | |
| } | |
| } | |
| if (attribute == null) { | |
| throw new IllegalStateException( | |
| "Bucket attribute missing during merge for " + ATTRIBUTE_BINDING_TARGET_FIELD | |
| ); | |
| } | |
| mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute); | |
| mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME, attribute); | |
| } |
🤖 Prompt for AI Agents
In
`@server/src/main/java/org/opensearch/index/codec/CriteriaBasedPostingsFormat.java`
around lines 118 - 130, In merge(MergeState mergeState, NormsProducer norms) the
code assumes mergeState.fieldInfos[0].fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD)
is non-null which can cause an NPE; instead iterate over mergeState.fieldInfos
to find the first FieldInfos instance whose
fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD) is non-null, read its
getAttribute(BUCKET_NAME) with a null-check, and only then call
mergeState.segmentInfo.putAttribute(BUCKET_NAME, attribute) and
mergeState.mergeFieldInfos.fieldInfo(ATTRIBUTE_BINDING_TARGET_FIELD).putAttribute(BUCKET_NAME,
attribute); if no non-null fieldInfo is found, skip setting the attribute.
Ensure you still call delegateFieldsConsumer.merge(mergeState, norms) as before.
| public void testBasicFunctionality() throws IOException { | ||
| Directory dir = new ByteBuffersDirectory(); | ||
| IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer()); | ||
| iwc.setCodec(criteriaBasedPostingFormat); | ||
|
|
||
| try (IndexWriter writer = new IndexWriter(dir, iwc)) { | ||
| for (int i = 0; i < 100; i++) { | ||
| Document doc = new Document(); | ||
| doc.add(new StringField("_id", "doc" + i, Field.Store.YES)); | ||
| doc.add(new TextField("field", "value" + i, Field.Store.YES)); | ||
| writer.addDocument(doc); | ||
| } | ||
| } | ||
|
|
||
| try (IndexReader reader = DirectoryReader.open(dir)) { | ||
| assertEquals(100, reader.numDocs()); | ||
|
|
||
| Terms terms = MultiTerms.getTerms(reader, "field"); | ||
| assertNotNull(terms); | ||
| TermsEnum termsEnum = terms.iterator(); | ||
|
|
||
| int count = 0; | ||
| BytesRef term; | ||
| while ((term = termsEnum.next()) != null) { | ||
| assertTrue(term.utf8ToString().startsWith("value")); | ||
| count++; | ||
| } | ||
| assertEquals(100, count); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "CriteriaBasedPostingsFormatTests.java" | head -5Repository: opensearch-project/OpenSearch
Length of output: 160
🏁 Script executed:
# List test files in the codec directory
fd -t f ".*Tests\.java$" server/src/test/java/org/opensearch/index/codec/Repository: opensearch-project/OpenSearch
Length of output: 1725
🏁 Script executed:
# Get file size first
wc -l server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.javaRepository: opensearch-project/OpenSearch
Length of output: 162
🏁 Script executed:
cat -n server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.javaRepository: opensearch-project/OpenSearch
Length of output: 6445
🌐 Web query:
Lucene ByteBuffersDirectory AutoCloseable resource management
💡 Result:
Short answer: ByteBuffersDirectory implements Closeable/AutoCloseable and must be closed to free its heap ByteBuffers — use try-with-resources or call close() when you’re done. [1][2]
Key points
- ByteBuffersDirectory implements Closeable/AutoCloseable and exposes close() which throws IOException, so it participates in Java's try-with-resources. [1]
- Directory (and BaseDirectory) semantics expect callers to close the Directory when finished; do not close it while an IndexWriter/IndexReader/IndexSearcher is still using it (closing will make the Directory unusable). [2]
Minimal example (conceptual)
- try (Directory dir = new ByteBuffersDirectory()) { /* use IndexWriter/Readers here */ } // dir.close() called automatically.
Sources
[1] ByteBuffersDirectory Javadoc (implements Closeable/AutoCloseable, has close()).
[2] Directory / BaseDirectory Javadoc (Directory is Closeable; close() semantics).
🌐 Web query:
Lucene DirectoryReader close directory resource leak
💡 Result:
Yes — if you open a DirectoryReader and do not close it (or properly decRef it), you will leak file handles and other resources. Fixes:
- Always close the reader when you’re done (try-with-resources or finally). DirectoryReader implements Closeable. [1]
- In long-running apps use SearcherManager (or your own ref-counting) to safely reuse/reopen readers without closing readers still in use. DirectoryReader.openIfChanged returns a new reader but does NOT close the old one — you must close it when safe. [1][2]
- If you manually share readers, use incRef()/decRef() (or ensure only one owner closes) to avoid closing while others still use it. [1]
Minimal example (concept):
try (DirectoryReader r = DirectoryReader.open(dir)) {
IndexSearcher s = new IndexSearcher(r);
// use searcher
} // reader closed automatically
References:
[1] Lucene DirectoryReader javadoc (notes on close, openIfChanged, SearcherManager).
[2] StackOverflow discussion: need to close DirectoryReader / openIfChanged behavior.
Close Directory instances to avoid resource leaks in tests.
All four test methods create ByteBuffersDirectory instances without closing them. ByteBuffersDirectory implements Closeable and must be closed to free heap resources. Wrap the directory in try-with-resources, ensuring the IndexReader closes before the directory:
♻️ Suggested fix (apply to all four test methods)
- Directory dir = new ByteBuffersDirectory();
- IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());
- iwc.setCodec(criteriaBasedPostingFormat);
-
- try (IndexWriter writer = new IndexWriter(dir, iwc)) {
- for (int i = 0; i < 100; i++) {
- Document doc = new Document();
- doc.add(new StringField("_id", "doc" + i, Field.Store.YES));
- doc.add(new TextField("field", "value" + i, Field.Store.YES));
- writer.addDocument(doc);
- }
- }
-
- try (IndexReader reader = DirectoryReader.open(dir)) {
- assertEquals(100, reader.numDocs());
-
- Terms terms = MultiTerms.getTerms(reader, "field");
- assertNotNull(terms);
- TermsEnum termsEnum = terms.iterator();
-
- int count = 0;
- BytesRef term;
- while ((term = termsEnum.next()) != null) {
- assertTrue(term.utf8ToString().startsWith("value"));
- count++;
- }
- assertEquals(100, count);
- }
+ try (Directory dir = new ByteBuffersDirectory()) {
+ IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());
+ iwc.setCodec(criteriaBasedPostingFormat);
+
+ try (IndexWriter writer = new IndexWriter(dir, iwc)) {
+ for (int i = 0; i < 100; i++) {
+ Document doc = new Document();
+ doc.add(new StringField("_id", "doc" + i, Field.Store.YES));
+ doc.add(new TextField("field", "value" + i, Field.Store.YES));
+ writer.addDocument(doc);
+ }
+ }
+
+ try (IndexReader reader = DirectoryReader.open(dir)) {
+ assertEquals(100, reader.numDocs());
+
+ Terms terms = MultiTerms.getTerms(reader, "field");
+ assertNotNull(terms);
+ TermsEnum termsEnum = terms.iterator();
+
+ int count = 0;
+ BytesRef term;
+ while ((term = termsEnum.next()) != null) {
+ assertTrue(term.utf8ToString().startsWith("value"));
+ count++;
+ }
+ assertEquals(100, count);
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public void testBasicFunctionality() throws IOException { | |
| Directory dir = new ByteBuffersDirectory(); | |
| IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer()); | |
| iwc.setCodec(criteriaBasedPostingFormat); | |
| try (IndexWriter writer = new IndexWriter(dir, iwc)) { | |
| for (int i = 0; i < 100; i++) { | |
| Document doc = new Document(); | |
| doc.add(new StringField("_id", "doc" + i, Field.Store.YES)); | |
| doc.add(new TextField("field", "value" + i, Field.Store.YES)); | |
| writer.addDocument(doc); | |
| } | |
| } | |
| try (IndexReader reader = DirectoryReader.open(dir)) { | |
| assertEquals(100, reader.numDocs()); | |
| Terms terms = MultiTerms.getTerms(reader, "field"); | |
| assertNotNull(terms); | |
| TermsEnum termsEnum = terms.iterator(); | |
| int count = 0; | |
| BytesRef term; | |
| while ((term = termsEnum.next()) != null) { | |
| assertTrue(term.utf8ToString().startsWith("value")); | |
| count++; | |
| } | |
| assertEquals(100, count); | |
| } | |
| public void testBasicFunctionality() throws IOException { | |
| try (Directory dir = new ByteBuffersDirectory()) { | |
| IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer()); | |
| iwc.setCodec(criteriaBasedPostingFormat); | |
| try (IndexWriter writer = new IndexWriter(dir, iwc)) { | |
| for (int i = 0; i < 100; i++) { | |
| Document doc = new Document(); | |
| doc.add(new StringField("_id", "doc" + i, Field.Store.YES)); | |
| doc.add(new TextField("field", "value" + i, Field.Store.YES)); | |
| writer.addDocument(doc); | |
| } | |
| } | |
| try (IndexReader reader = DirectoryReader.open(dir)) { | |
| assertEquals(100, reader.numDocs()); | |
| Terms terms = MultiTerms.getTerms(reader, "field"); | |
| assertNotNull(terms); | |
| TermsEnum termsEnum = terms.iterator(); | |
| int count = 0; | |
| BytesRef term; | |
| while ((term = termsEnum.next()) != null) { | |
| assertTrue(term.utf8ToString().startsWith("value")); | |
| count++; | |
| } | |
| assertEquals(100, count); | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In
`@server/src/test/java/org/opensearch/index/codec/CriteriaBasedPostingsFormatTests.java`
around lines 41 - 69, The test creates a ByteBuffersDirectory (Directory dir =
new ByteBuffersDirectory()) but never closes it; wrap the Directory in a
try-with-resources that encloses the IndexWriter and IndexReader usage in
testBasicFunctionality (and apply the same change to the other three test
methods) so the Directory is closed after the reader; ensure the IndexReader is
opened/closed inside the Directory try block (or nest the reader/writer
try-with-resources inside the Directory try-with-resources) to guarantee the
reader is closed before the Directory is closed.
| @LockFeatureFlag(CONTEXT_AWARE_MIGRATION_EXPERIMENTAL_FLAG) | ||
| public void testCriteriaBasedGrouping() throws Exception { | ||
| final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings( | ||
| "test", | ||
| Settings.builder() | ||
| .put(defaultSettings.getSettings()) | ||
| .put(IndexSettings.INDEX_CONTEXT_AWARE_ENABLED_SETTING.getKey(), true) | ||
| .build() | ||
| ); | ||
| try ( | ||
| Store store = createStore(); | ||
| Engine engine = createEngine(indexSettings, store, createTempDir(), newCriteriaBasedMergePolicy(), null, null, null, null, null) | ||
| ) { | ||
| List<Segment> segments = engine.segments(true); | ||
| assertThat(segments.isEmpty(), equalTo(true)); | ||
| for (int i = 1; i <= 100; i++) { | ||
| String groupingCriteria; | ||
| if (i % 3 == 0) { | ||
| groupingCriteria = "grouping_criteria1"; | ||
| } else if (i % 3 == 1) { | ||
| groupingCriteria = "grouping_criteria2"; | ||
| } else { | ||
| groupingCriteria = "grouping_criteria3"; | ||
| } | ||
|
|
||
| final ParsedDocument doc = testParsedDocument("1", null, testContextSpecificDocument(groupingCriteria), B_1, null); | ||
| engine.index(indexForDoc(doc)); | ||
| if (i % 5 == 0) { | ||
| engine.refresh("test"); | ||
| } | ||
| } | ||
|
|
||
| engine.refresh("test"); | ||
| segments = engine.segments(true); | ||
| Set<String> attributes = segments.stream().map(segment -> segment.getAttributes().get(BUCKET_NAME)).collect(Collectors.toSet()); | ||
| assertThat(attributes, Matchers.containsInAnyOrder("grouping_criteria1", "grouping_criteria2", "grouping_criteria3")); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid reusing the same _id in the grouping test.
Line 3841 always indexes id "1", so only the last criteria remains live; earlier buckets can be dropped/merged away, making the bucket assertion flaky or incorrect. Use unique ids per iteration (or per bucket) to keep live docs in each bucket.
✅ Suggested fix
- final ParsedDocument doc = testParsedDocument("1", null, testContextSpecificDocument(groupingCriteria), B_1, null);
+ final String docId = "doc_" + i + "_" + groupingCriteria;
+ final ParsedDocument doc = testParsedDocument(docId, null, testContextSpecificDocument(groupingCriteria), B_1, null);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @LockFeatureFlag(CONTEXT_AWARE_MIGRATION_EXPERIMENTAL_FLAG) | |
| public void testCriteriaBasedGrouping() throws Exception { | |
| final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings( | |
| "test", | |
| Settings.builder() | |
| .put(defaultSettings.getSettings()) | |
| .put(IndexSettings.INDEX_CONTEXT_AWARE_ENABLED_SETTING.getKey(), true) | |
| .build() | |
| ); | |
| try ( | |
| Store store = createStore(); | |
| Engine engine = createEngine(indexSettings, store, createTempDir(), newCriteriaBasedMergePolicy(), null, null, null, null, null) | |
| ) { | |
| List<Segment> segments = engine.segments(true); | |
| assertThat(segments.isEmpty(), equalTo(true)); | |
| for (int i = 1; i <= 100; i++) { | |
| String groupingCriteria; | |
| if (i % 3 == 0) { | |
| groupingCriteria = "grouping_criteria1"; | |
| } else if (i % 3 == 1) { | |
| groupingCriteria = "grouping_criteria2"; | |
| } else { | |
| groupingCriteria = "grouping_criteria3"; | |
| } | |
| final ParsedDocument doc = testParsedDocument("1", null, testContextSpecificDocument(groupingCriteria), B_1, null); | |
| engine.index(indexForDoc(doc)); | |
| if (i % 5 == 0) { | |
| engine.refresh("test"); | |
| } | |
| } | |
| engine.refresh("test"); | |
| segments = engine.segments(true); | |
| Set<String> attributes = segments.stream().map(segment -> segment.getAttributes().get(BUCKET_NAME)).collect(Collectors.toSet()); | |
| assertThat(attributes, Matchers.containsInAnyOrder("grouping_criteria1", "grouping_criteria2", "grouping_criteria3")); | |
| } | |
| } | |
| `@LockFeatureFlag`(CONTEXT_AWARE_MIGRATION_EXPERIMENTAL_FLAG) | |
| public void testCriteriaBasedGrouping() throws Exception { | |
| final IndexSettings indexSettings = IndexSettingsModule.newIndexSettings( | |
| "test", | |
| Settings.builder() | |
| .put(defaultSettings.getSettings()) | |
| .put(IndexSettings.INDEX_CONTEXT_AWARE_ENABLED_SETTING.getKey(), true) | |
| .build() | |
| ); | |
| try ( | |
| Store store = createStore(); | |
| Engine engine = createEngine(indexSettings, store, createTempDir(), newCriteriaBasedMergePolicy(), null, null, null, null, null) | |
| ) { | |
| List<Segment> segments = engine.segments(true); | |
| assertThat(segments.isEmpty(), equalTo(true)); | |
| for (int i = 1; i <= 100; i++) { | |
| String groupingCriteria; | |
| if (i % 3 == 0) { | |
| groupingCriteria = "grouping_criteria1"; | |
| } else if (i % 3 == 1) { | |
| groupingCriteria = "grouping_criteria2"; | |
| } else { | |
| groupingCriteria = "grouping_criteria3"; | |
| } | |
| final String docId = "doc_" + i + "_" + groupingCriteria; | |
| final ParsedDocument doc = testParsedDocument(docId, null, testContextSpecificDocument(groupingCriteria), B_1, null); | |
| engine.index(indexForDoc(doc)); | |
| if (i % 5 == 0) { | |
| engine.refresh("test"); | |
| } | |
| } | |
| engine.refresh("test"); | |
| segments = engine.segments(true); | |
| Set<String> attributes = segments.stream().map(segment -> segment.getAttributes().get(BUCKET_NAME)).collect(Collectors.toSet()); | |
| assertThat(attributes, Matchers.containsInAnyOrder("grouping_criteria1", "grouping_criteria2", "grouping_criteria3")); | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@server/src/test/java/org/opensearch/index/engine/InternalEngineTests.java`
around lines 3816 - 3853, In testCriteriaBasedGrouping the test reuses the same
document id ("1") for every iteration which causes older grouping buckets to be
removed/merged and makes the final bucket assertion flaky; update the loop in
testCriteriaBasedGrouping so each ParsedDocument created by testParsedDocument
has a unique _id (for example append the loop index i or the groupingCriteria to
the id) when calling engine.index(indexForDoc(doc)) so that live docs exist in
each bucket and the Set<String> attributes assertion reliably contains all three
grouping criteria.
|
❌ Gradle check result for 7e949cb: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
Fix CriteriaBasedCodec to work with delegate codec
Related Issues
Fix CriteriaBasedCodec to work with delegate codec