CNDB-15423: Make TopKProcessor lazily load queryVector by michaeljmarshall · Pull Request #2007 · datastax/cassandra

michaeljmarshall · 2025-09-19T19:58:12Z

What is the issue

Fixes: https://github.com/riptano/cndb/issues/15423
CNDB test: https://github.com/riptano/cndb/pull/15427

What does this PR fix and why was it fixed

This PR fixes a subtle bug exposed by the FeaturesVersionSupportCATest.testANN test, as well as other similarly named FeaturesVersionSupport*Test tests. The central issue is whether to create the queryVector object. The currently logic does so only when useSyntheticScore() is false. However, if a peer has it set to false and this node has it on true, we'll end up with an easily avoided NPE. I propose instead that we lazily load the vector when it is needed. This removes an unnecessary switch as well as removing an unnecessary ordering requirement on upgrades.

java.lang.NullPointerException
	at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85)
	at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145)
	at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233)
	at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566)
	at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60)

github-actions · 2025-09-19T19:58:28Z

sonarqubecloud · 2025-09-19T20:44:24Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-09-19T20:52:40Z

✔️ Build ds-cassandra-pr-gate/PR-2007 approved by Butler

Approved by Butler
See build details here

eolivelli

Lgtm
Great catch.
I don't think we need additional unit tests in this case.
This change is actually fixing broken tests

k-rus

To me it looks good. My understanding is that the condition on the synthetic score was added by @adelapena, thus I mention him if he wants to see the fix in retrospective.

…2007) Fixes: riptano/cndb#15423 CNDB test: riptano/cndb#15427 This PR fixes a subtle bug exposed by the `FeaturesVersionSupportCATest.testANN` test, as well as other similarly named `FeaturesVersionSupport*Test` tests. The central issue is whether to create the `queryVector` object. The currently logic does so only when `useSyntheticScore()` is false. However, if a peer has it set to false and this node has it on true, we'll end up with an easily avoided NPE. I propose instead that we lazily load the vector when it is needed. This removes an unnecessary switch as well as removing an unnecessary ordering requirement on upgrades. ``` java.lang.NullPointerException at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85) at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40) at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333) at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276) at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145) at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233) at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566) at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60) ```

…2007) Fixes: riptano/cndb#15423 CNDB test: riptano/cndb#15427 This PR fixes a subtle bug exposed by the `FeaturesVersionSupportCATest.testANN` test, as well as other similarly named `FeaturesVersionSupport*Test` tests. The central issue is whether to create the `queryVector` object. The currently logic does so only when `useSyntheticScore()` is false. However, if a peer has it set to false and this node has it on true, we'll end up with an easily avoided NPE. I propose instead that we lazily load the vector when it is needed. This removes an unnecessary switch as well as removing an unnecessary ordering requirement on upgrades. ``` java.lang.NullPointerException at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85) at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40) at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333) at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276) at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145) at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233) at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566) at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60) ``` (Rebase of commit be64177)

CNDB-15423: Make TopKProcessor lazily load queryVector

5f6c8b1

michaeljmarshall requested review from eolivelli and k-rus September 19, 2025 19:58

michaeljmarshall self-assigned this Sep 19, 2025

michaeljmarshall mentioned this pull request Sep 19, 2025

CNDB-14802: Improve async byte allocation estimates in SAI SegmentBuilder #2002

Open

eolivelli approved these changes Sep 20, 2025

View reviewed changes

k-rus approved these changes Sep 20, 2025

View reviewed changes

michaeljmarshall merged commit aae5c62 into main Sep 22, 2025
494 checks passed

michaeljmarshall deleted the cndb-15423 branch September 22, 2025 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CNDB-15423: Make TopKProcessor lazily load queryVector#2007

CNDB-15423: Make TopKProcessor lazily load queryVector#2007
michaeljmarshall merged 1 commit intomainfrom
cndb-15423

michaeljmarshall commented Sep 19, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 19, 2025 •

edited by michaeljmarshall

Loading

Uh oh!

sonarqubecloud bot commented Sep 19, 2025

Uh oh!

cassci-bot commented Sep 19, 2025

Uh oh!

eolivelli left a comment

Uh oh!

k-rus left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

michaeljmarshall commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the issue

What does this PR fix and why was it fixed

Uh oh!

github-actions bot commented Sep 19, 2025 • edited by michaeljmarshall Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist before you submit for review

Uh oh!

sonarqubecloud bot commented Sep 19, 2025

Quality Gate passed

Uh oh!

cassci-bot commented Sep 19, 2025

✔️ Build ds-cassandra-pr-gate/PR-2007 approved by Butler

Uh oh!

eolivelli left a comment

Choose a reason for hiding this comment

Uh oh!

k-rus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

michaeljmarshall commented Sep 19, 2025 •

edited

Loading

github-actions bot commented Sep 19, 2025 •

edited by michaeljmarshall

Loading