Skip to content

fix(search): make bitmap scorer repeatable#21815

Open
gingeekrishna wants to merge 23 commits into
opensearch-project:mainfrom
gingeekrishna:fix/21781-bitmap-terms-lookup
Open

fix(search): make bitmap scorer repeatable#21815
gingeekrishna wants to merge 23 commits into
opensearch-project:mainfrom
gingeekrishna:fix/21781-bitmap-terms-lookup

Conversation

@gingeekrishna
Copy link
Copy Markdown

@gingeekrishna gingeekrishna commented May 23, 2026

Description

BitmapIndexQuery and Bitmap64IndexQuery had their DocIdSetBuilder and MergePointVisitor declared as fields of the ScorerSupplier anonymous class, causing all calls to ScorerSupplier.get() to share mutable state. This violates the Lucene contract that each get() call must produce an independent scorer, and caused incorrect empty results in complex bool queries using bitmap terms lookup on multi-segment indices.

Root causes fixed:

  • Shared mutable state in get()DocIdSetBuilder and MergePointVisitor were constructed once and reused across multiple get() calls. Moved both into the body of get() so each invocation builds a fresh, independent DocIdSet.
  • BytesRef aliasing in bitmapEncodedIterator.next() — the returned BytesRef shared the same backing array as the internal buffer, so subsequent iteration silently overwrote previously returned values. Fixed by returning BytesRef.deepCopyOf(encoded).
  • Stale hasBuffered flag in Bitmap64IndexQuery.bitmapEncodedIterator.advance() — when the iterator was exhausted without finding a value >= target, hasBuffered was left in whatever state it had before, allowing a stale buffered value to surface on the next next() call. Fixed by explicitly setting hasBuffered = false when the loop exits without a match.

Testing

Added regression tests to both BitmapIndexQueryTests and Bitmap64IndexQueryTests:

  • testScorerSupplierGetIsRepeatable — obtains two independent ScorerSupplier instances per leaf, calls get() on each, and asserts both return identical match sets.
  • testScoreNoMatchingDocs — verifies correct empty-result handling when the bitmap contains no values present in the index.
  • testRewrite — verifies that an empty bitmap rewrites to MatchNoDocsQuery.

Related Issues

Resolves #21781

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable. Not applicable — no public API contract change.
  • Public documentation issue/PR created, if applicable. Not applicable — internal query execution correctness fix.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

PR Reviewer Guide 🔍

(Review updated until commit b007896)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

PR Code Suggestions ✨

Latest suggestions up to b007896

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Reset state at method start

The advance method should initialize hasBuffered to false at the start to ensure
correct state when called multiple times. Without this, if advance is called after a
previous successful match, hasBuffered may remain true even when the iterator is
exhausted, leading to incorrect behavior.

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [89-100]

 public void advance(byte[] target) {
+    hasBuffered = false;
     while (it.hasNext()) {
         long v = it.next();
         LongPoint.encodeDimension(v, currentBytes, 0);
 
         if (Arrays.compareUnsigned(currentBytes, target) >= 0) {
             hasBuffered = true;
             return;
         }
     }
-    hasBuffered = false; // Iterator exhausted, no match found
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential state management issue. Setting hasBuffered = false at the start of advance ensures the method is idempotent and prevents stale state from previous invocations. This is a critical correctness fix that prevents incorrect behavior when advance is called multiple times.

Medium

Previous suggestions

Suggestions up to commit a1de5ff
CategorySuggestion                                                                                                                                    Impact
Possible issue
Copy matched value to encoded buffer

The advance method should also copy the matching value to encoded.bytes when a match
is found, similar to how next() uses System.arraycopy. Without this, the buffered
value in currentBytes won't be properly transferred when next() is called after
advance().

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [89-100]

 public void advance(byte[] target) {
     while (it.hasNext()) {
         long v = it.next();
         LongPoint.encodeDimension(v, currentBytes, 0);
 
         if (Arrays.compareUnsigned(currentBytes, target) >= 0) {
             hasBuffered = true;
+            System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES);
             return;
         }
     }
     hasBuffered = false; // Iterator exhausted, no match found
 }
Suggestion importance[1-10]: 9

__

Why: This is a critical bug fix. The advance method finds a matching value and sets hasBuffered = true, but doesn't copy currentBytes to encoded.bytes. When next() is subsequently called, it uses System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES) to transfer the buffered value, but currentBytes may have been overwritten by subsequent operations. The suggestion correctly identifies that the copy should happen in advance() when the match is found, ensuring data integrity.

High
Suggestions up to commit e2a1964
CategorySuggestion                                                                                                                                    Impact
Possible issue
Copy matched value to encoded buffer

The advance method should also copy the matched value to encoded.bytes when a match
is found, similar to how next() uses System.arraycopy. Without this, the buffered
value in currentBytes won't be properly transferred when next() is called after
advance().

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [89-100]

 public void advance(byte[] target) {
     while (it.hasNext()) {
         long v = it.next();
         LongPoint.encodeDimension(v, currentBytes, 0);
 
         if (Arrays.compareUnsigned(currentBytes, target) >= 0) {
             hasBuffered = true;
+            System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES);
             return;
         }
     }
     hasBuffered = false; // Iterator exhausted, no match found
 }
Suggestion importance[1-10]: 9

__

Why: This is a critical bug fix. The advance method finds a matching value and sets hasBuffered = true, but doesn't copy the value from currentBytes to encoded.bytes. When next() is subsequently called, it uses System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES) to transfer the buffered value, but currentBytes may have been overwritten by then, leading to incorrect results.

High
Suggestions up to commit 25d07f7
CategorySuggestion                                                                                                                                    Impact
Possible issue
Copy matched value to encoded buffer

The advance method should also copy the matched value to encoded.bytes when a match
is found, similar to how next() uses System.arraycopy. Without this, the buffered
value in currentBytes won't be properly transferred when next() is called after
advance().

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [89-100]

 public void advance(byte[] target) {
     while (it.hasNext()) {
         long v = it.next();
         LongPoint.encodeDimension(v, currentBytes, 0);
 
         if (Arrays.compareUnsigned(currentBytes, target) >= 0) {
             hasBuffered = true;
+            System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES);
             return;
         }
     }
     hasBuffered = false; // Iterator exhausted, no match found
 }
Suggestion importance[1-10]: 9

__

Why: This is a critical bug fix. The advance method sets hasBuffered = true but doesn't copy currentBytes to encoded.bytes. When next() is subsequently called, it uses System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES) expecting the buffered value to be in currentBytes, but currentBytes may have been overwritten. The suggestion correctly identifies that the copy should happen in advance when a match is found, ensuring data integrity.

High
Suggestions up to commit 7d96516
CategorySuggestion                                                                                                                                    Impact
Possible issue
Copy matched value to buffer

The advance() method should copy currentBytes to encoded.bytes when a match is
found, ensuring the buffered value is properly stored for the next next() call.
Without this copy, next() may return stale data from encoded.bytes.

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [95-98]

 if (Arrays.compareUnsigned(currentBytes, target) >= 0) {
+    System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES);
     hasBuffered = true;
     return;
 }
Suggestion importance[1-10]: 9

__

Why: This is a critical bug fix. The advance() method sets hasBuffered = true but doesn't copy currentBytes to encoded.bytes. When next() is called, it copies from encoded.bytes which contains stale data, not the buffered value. This would cause incorrect query results.

High
Suggestions up to commit 4ca5fb8
CategorySuggestion                                                                                                                                    Impact
General
Avoid redundant array copy operation

The BytesRef.deepCopyOf(encoded) creates unnecessary copies on every call. Since
encoded is reused, callers may retain references to the returned BytesRef, making
deep copies essential. However, copying currentBytes to encoded.bytes before deep
copying is redundant—copy directly from currentBytes.

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [75-87]

 public BytesRef next() {
     if (hasBuffered) {
         hasBuffered = false;
-        System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES);
-        return BytesRef.deepCopyOf(encoded);
+        return new BytesRef(currentBytes, 0, Long.BYTES);
     }
 
     if (!it.hasNext()) return null;
 
     long v = it.next();
     LongPoint.encodeDimension(v, encoded.bytes, 0);
     return BytesRef.deepCopyOf(encoded);
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that copying currentBytes to encoded.bytes before creating a deep copy is redundant. Creating a BytesRef directly from currentBytes eliminates the unnecessary System.arraycopy operation, improving performance. However, the impact is moderate as this optimization affects only the buffered case.

Medium

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a correctness bug in bitmap index queries by making ScorerSupplier#get() repeatable: it rebuilds mutable intersection state on each get() call so repeated scorer creation doesn’t consume shared iterator/visitor state and lead to empty results in complex queries.

Changes:

  • Rebuild DocIdSetBuilder and MergePointVisitor inside ScorerSupplier#get() for both 32-bit and 64-bit bitmap index queries.
  • Add regression tests for both query types that call ScorerSupplier#get() twice and compare matches.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
server/src/main/java/org/opensearch/search/query/BitmapIndexQuery.java Moves mutable intersection state into ScorerSupplier#get() to ensure repeatable scorer creation.
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java Same repeatability fix as 32-bit variant for 64-bit bitmap queries.
server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java Adds a repeatability regression test and helper to extract matches from a Scorer.
server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java Adds a repeatability regression test and helper to extract matches from a Scorer.
Comments suppressed due to low confidence (1)

server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java:266

  • The repeatability check only inspects reader.leaves().get(0). With randomized IndexWriter settings it’s possible for leaf 0 to contain none of the matching docs, in which case both get() calls return an empty list and the regression test would pass even if the scorer is broken. Consider aggregating matches across all leaves (or asserting the first scorer returns the expected non-empty matches) before comparing the second scorer’s results.
        LeafReaderContext leaf = reader.leaves().get(0);
        ScorerSupplier supplier = weight.scorerSupplier(leaf);
        assertNotNull(supplier);

        Scorer scorer1 = supplier.get(Long.MAX_VALUE);
        Scorer scorer2 = supplier.get(Long.MAX_VALUE);

        assertEquals(getMatchingValues(scorer1, leaf), getMatchingValues(scorer2, leaf));

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java Outdated
Comment thread server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ac01cdf

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 91c3d85

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment on lines 83 to 86
w.commit();
reader.close();
reader = DirectoryReader.open(w);
searcher = newSearcher(reader);
Comment on lines +261 to +269

LeafReaderContext leaf = reader.leaves().get(0);
ScorerSupplier supplier = weight.scorerSupplier(leaf);
assertNotNull(supplier);

Scorer scorer1 = supplier.get(Long.MAX_VALUE);
Scorer scorer2 = supplier.get(Long.MAX_VALUE);

assertEquals(getMatchingValues(scorer1, leaf), getMatchingValues(scorer2, leaf));
Comment thread server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java Outdated
gingeekrishna and others added 18 commits May 25, 2026 22:36
Close the existing reader before reopening DirectoryReader in bitmap index query tests to avoid handle leaks in repeated refresh paths.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Use an exception-safe refresh helper in BitmapIndexQueryTests and apply it across reader reopen sites to avoid leak-prone reader reassignment patterns.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Initialize searcher in BitmapIndexQueryTests @before so testRewrite and related tests never rewrite against a null searcher.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Skip leaves where Weight#scorer returns null in BitmapIndexQueryTests helper to prevent NPEs in multi-segment/randomized test runs.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Ensure we don't lose reference to the old reader even if DirectoryReader.open fails, improving exception safety.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Close old reader/writer before creating new ones to prevent mixed reader/writer state. This ensures the reader and writer are always in sync and prevents AlreadyClosedException or resource leaks during test teardown.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Build bitmap intersections lazily once per leaf and reuse the built DocIdSet across repeated ScorerSupplier#get calls. Return null scorer when intersections are empty to avoid null-iterator failures.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…ability

Ensure ScorerSupplier.get() can be called multiple times with identical results
by rebuilding the DocIdSet fresh each time instead of caching it. This ensures
the BitmapIterator and MergePointVisitor are created fresh for each call.

Fixes testScorerSupplierGetIsRepeatable test failures in both 32-bit and 64-bit
bitmap index queries.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Prevent stale buffered values from being used in subsequent iteration cycles.
When advance() exhausts the bitmap iterator without finding a value >= target,
explicitly reset hasBuffered to false to ensure next() doesn't return invalid data.

Fixes potential repeatability issues in bitmap scorer repeatability tests.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Use supplier.cost() as the leadCost when invoking ScorerSupplier#get in
bitmap repeatability tests. Passing Long.MAX_VALUE violates Lucene's
asserting scorer supplier checks and causes assertion failures unrelated
to scorer repeatability behavior.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…per get() call

Lucene's AssertingWeight (used by newSearcher() in tests) wraps
ScorerSupplier and asserts get() is called at most once per instance.
Calling get() twice on the same supplier triggers AssertionError.

Fix both BitmapIndexQueryTests and Bitmap64IndexQueryTests to obtain
a fresh ScorerSupplier from the weight for each scorer retrieval,
which is also the semantically correct way to test that the weight
produces repeatable results across independent scorer requests.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
DocIdSetBuilder.build() never returns null — it returns DocIdSet.EMPTY
when no documents are added. Remove the dead null check and inline
the iterator() call to eliminate the uncovered branch.

Add testScoreNoMatchingDocs to both BitmapIndexQueryTests and
Bitmap64IndexQueryTests to cover the iterator-is-null path (reached
when the field has point values but no documents match the bitmap).
This brings patch coverage above the 73.33% threshold required by
Codecov.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…tion

In Lucene 10.4.0, DocIdSetBuilder.build() never returns DocIdSet.EMPTY;
when no documents match it returns an IntArrayDocIdSet with length 0
whose iterator() is non-null (yields NO_MORE_DOCS immediately).

The 'if (iterator == null)' guard added in the previous commit was
therefore dead code and the assertNull(weight.scorer(leaf)) assertion
in testScoreNoMatchingDocs was wrong — the scorer is a valid empty
ConstantScoreScorer, not null.

Simplify get() back to the single-line return and keep only the
meaningful assertion (result list is empty) in testScoreNoMatchingDocs.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…4IndexQueryTests

Add testRewrite to verify empty bitmap rewrites to MatchNoDocsQuery,
mirroring the existing test in BitmapIndexQueryTests. Also initialize
searcher in @before for symmetry with BitmapIndexQueryTests.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
@gingeekrishna gingeekrishna force-pushed the fix/21781-bitmap-terms-lookup branch from 5597c11 to 1f3fb32 Compare May 25, 2026 17:06
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1f3fb32

@gingeekrishna gingeekrishna force-pushed the fix/21781-bitmap-terms-lookup branch from 1f3fb32 to 7850d8e Compare May 25, 2026 17:09
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a1de5ff

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for a1de5ff: SUCCESS

…s helper

SortedNumericDocValues.advanceExact() returns false when the document
has no doc values for the field; calling docValueCount()/nextValue()
on a miss is an API violation. Guard the loop body with the return value,
consistent with the Bitmap64IndexQueryTests helper.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit b007896

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for b007896: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working missing-component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The combination of bitmap terms lookup + additional filter + text match produces wrong results, and force merge fixes it.

2 participants