fix(search): make bitmap scorer repeatable#21815
Conversation
PR Reviewer Guide 🔍(Review updated until commit b007896)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to b007896 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit a1de5ff
Suggestions up to commit e2a1964
Suggestions up to commit 25d07f7
Suggestions up to commit 7d96516
Suggestions up to commit 4ca5fb8
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a correctness bug in bitmap index queries by making ScorerSupplier#get() repeatable: it rebuilds mutable intersection state on each get() call so repeated scorer creation doesn’t consume shared iterator/visitor state and lead to empty results in complex queries.
Changes:
- Rebuild
DocIdSetBuilderandMergePointVisitorinsideScorerSupplier#get()for both 32-bit and 64-bit bitmap index queries. - Add regression tests for both query types that call
ScorerSupplier#get()twice and compare matches.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
server/src/main/java/org/opensearch/search/query/BitmapIndexQuery.java |
Moves mutable intersection state into ScorerSupplier#get() to ensure repeatable scorer creation. |
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java |
Same repeatability fix as 32-bit variant for 64-bit bitmap queries. |
server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java |
Adds a repeatability regression test and helper to extract matches from a Scorer. |
server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java |
Adds a repeatability regression test and helper to extract matches from a Scorer. |
Comments suppressed due to low confidence (1)
server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java:266
- The repeatability check only inspects
reader.leaves().get(0). With randomized IndexWriter settings it’s possible for leaf 0 to contain none of the matching docs, in which case bothget()calls return an empty list and the regression test would pass even if the scorer is broken. Consider aggregating matches across all leaves (or asserting the first scorer returns the expected non-empty matches) before comparing the second scorer’s results.
LeafReaderContext leaf = reader.leaves().get(0);
ScorerSupplier supplier = weight.scorerSupplier(leaf);
assertNotNull(supplier);
Scorer scorer1 = supplier.get(Long.MAX_VALUE);
Scorer scorer2 = supplier.get(Long.MAX_VALUE);
assertEquals(getMatchingValues(scorer1, leaf), getMatchingValues(scorer2, leaf));
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Persistent review updated to latest commit ac01cdf |
|
Persistent review updated to latest commit 91c3d85 |
| w.commit(); | ||
| reader.close(); | ||
| reader = DirectoryReader.open(w); | ||
| searcher = newSearcher(reader); |
|
|
||
| LeafReaderContext leaf = reader.leaves().get(0); | ||
| ScorerSupplier supplier = weight.scorerSupplier(leaf); | ||
| assertNotNull(supplier); | ||
|
|
||
| Scorer scorer1 = supplier.get(Long.MAX_VALUE); | ||
| Scorer scorer2 = supplier.get(Long.MAX_VALUE); | ||
|
|
||
| assertEquals(getMatchingValues(scorer1, leaf), getMatchingValues(scorer2, leaf)); |
Close the existing reader before reopening DirectoryReader in bitmap index query tests to avoid handle leaks in repeated refresh paths. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Use an exception-safe refresh helper in BitmapIndexQueryTests and apply it across reader reopen sites to avoid leak-prone reader reassignment patterns. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Initialize searcher in BitmapIndexQueryTests @before so testRewrite and related tests never rewrite against a null searcher. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Skip leaves where Weight#scorer returns null in BitmapIndexQueryTests helper to prevent NPEs in multi-segment/randomized test runs. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Ensure we don't lose reference to the old reader even if DirectoryReader.open fails, improving exception safety. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Close old reader/writer before creating new ones to prevent mixed reader/writer state. This ensures the reader and writer are always in sync and prevents AlreadyClosedException or resource leaks during test teardown. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Build bitmap intersections lazily once per leaf and reuse the built DocIdSet across repeated ScorerSupplier#get calls. Return null scorer when intersections are empty to avoid null-iterator failures. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…ability Ensure ScorerSupplier.get() can be called multiple times with identical results by rebuilding the DocIdSet fresh each time instead of caching it. This ensures the BitmapIterator and MergePointVisitor are created fresh for each call. Fixes testScorerSupplierGetIsRepeatable test failures in both 32-bit and 64-bit bitmap index queries. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Prevent stale buffered values from being used in subsequent iteration cycles. When advance() exhausts the bitmap iterator without finding a value >= target, explicitly reset hasBuffered to false to ensure next() doesn't return invalid data. Fixes potential repeatability issues in bitmap scorer repeatability tests. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Use supplier.cost() as the leadCost when invoking ScorerSupplier#get in bitmap repeatability tests. Passing Long.MAX_VALUE violates Lucene's asserting scorer supplier checks and causes assertion failures unrelated to scorer repeatability behavior. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…per get() call Lucene's AssertingWeight (used by newSearcher() in tests) wraps ScorerSupplier and asserts get() is called at most once per instance. Calling get() twice on the same supplier triggers AssertionError. Fix both BitmapIndexQueryTests and Bitmap64IndexQueryTests to obtain a fresh ScorerSupplier from the weight for each scorer retrieval, which is also the semantically correct way to test that the weight produces repeatable results across independent scorer requests. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
DocIdSetBuilder.build() never returns null — it returns DocIdSet.EMPTY when no documents are added. Remove the dead null check and inline the iterator() call to eliminate the uncovered branch. Add testScoreNoMatchingDocs to both BitmapIndexQueryTests and Bitmap64IndexQueryTests to cover the iterator-is-null path (reached when the field has point values but no documents match the bitmap). This brings patch coverage above the 73.33% threshold required by Codecov. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…tion In Lucene 10.4.0, DocIdSetBuilder.build() never returns DocIdSet.EMPTY; when no documents match it returns an IntArrayDocIdSet with length 0 whose iterator() is non-null (yields NO_MORE_DOCS immediately). The 'if (iterator == null)' guard added in the previous commit was therefore dead code and the assertNull(weight.scorer(leaf)) assertion in testScoreNoMatchingDocs was wrong — the scorer is a valid empty ConstantScoreScorer, not null. Simplify get() back to the single-line return and keep only the meaningful assertion (result list is empty) in testScoreNoMatchingDocs. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…4IndexQueryTests Add testRewrite to verify empty bitmap rewrites to MatchNoDocsQuery, mirroring the existing test in BitmapIndexQueryTests. Also initialize searcher in @before for symmetry with BitmapIndexQueryTests. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
5597c11 to
1f3fb32
Compare
|
Persistent review updated to latest commit 1f3fb32 |
1f3fb32 to
7850d8e
Compare
|
Persistent review updated to latest commit a1de5ff |
…s helper SortedNumericDocValues.advanceExact() returns false when the document has no doc values for the field; calling docValueCount()/nextValue() on a miss is an API violation. Guard the loop body with the return value, consistent with the Bitmap64IndexQueryTests helper. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
|
Persistent review updated to latest commit b007896 |
|
❌ Gradle check result for b007896: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Description
BitmapIndexQueryandBitmap64IndexQueryhad theirDocIdSetBuilderandMergePointVisitordeclared as fields of theScorerSupplieranonymous class, causing all calls toScorerSupplier.get()to share mutable state. This violates the Lucene contract that eachget()call must produce an independent scorer, and caused incorrect empty results in complex bool queries using bitmap terms lookup on multi-segment indices.Root causes fixed:
get()—DocIdSetBuilderandMergePointVisitorwere constructed once and reused across multipleget()calls. Moved both into the body ofget()so each invocation builds a fresh, independentDocIdSet.BytesRefaliasing inbitmapEncodedIterator.next()— the returnedBytesRefshared the same backing array as the internal buffer, so subsequent iteration silently overwrote previously returned values. Fixed by returningBytesRef.deepCopyOf(encoded).hasBufferedflag inBitmap64IndexQuery.bitmapEncodedIterator.advance()— when the iterator was exhausted without finding a value>= target,hasBufferedwas left in whatever state it had before, allowing a stale buffered value to surface on the nextnext()call. Fixed by explicitly settinghasBuffered = falsewhen the loop exits without a match.Testing
Added regression tests to both
BitmapIndexQueryTestsandBitmap64IndexQueryTests:testScorerSupplierGetIsRepeatable— obtains two independentScorerSupplierinstances per leaf, callsget()on each, and asserts both return identical match sets.testScoreNoMatchingDocs— verifies correct empty-result handling when the bitmap contains no values present in the index.testRewrite— verifies that an empty bitmap rewrites toMatchNoDocsQuery.Related Issues
Resolves #21781
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.