Skip to content

CNDB-15423: Make TopKProcessor lazily load queryVector#2007

Merged
michaeljmarshall merged 1 commit intomainfrom
cndb-15423
Sep 22, 2025
Merged

CNDB-15423: Make TopKProcessor lazily load queryVector#2007
michaeljmarshall merged 1 commit intomainfrom
cndb-15423

Conversation

@michaeljmarshall
Copy link
Member

@michaeljmarshall michaeljmarshall commented Sep 19, 2025

What is the issue

Fixes: https://github.com/riptano/cndb/issues/15423
CNDB test: https://github.com/riptano/cndb/pull/15427

What does this PR fix and why was it fixed

This PR fixes a subtle bug exposed by the FeaturesVersionSupportCATest.testANN test, as well as other similarly named FeaturesVersionSupport*Test tests. The central issue is whether to create the queryVector object. The currently logic does so only when useSyntheticScore() is false. However, if a peer has it set to false and this node has it on true, we'll end up with an easily avoided NPE. I propose instead that we lazily load the vector when it is needed. This removes an unnecessary switch as well as removing an unnecessary ordering requirement on upgrades.

java.lang.NullPointerException
	at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85)
	at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145)
	at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233)
	at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566)
	at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60)

@github-actions
Copy link

github-actions bot commented Sep 19, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

@sonarqubecloud
Copy link

@cassci-bot
Copy link

✔️ Build ds-cassandra-pr-gate/PR-2007 approved by Butler


Approved by Butler
See build details here

Copy link

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm
Great catch.
I don't think we need additional unit tests in this case.
This change is actually fixing broken tests

Copy link
Member

@k-rus k-rus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it looks good. My understanding is that the condition on the synthetic score was added by @adelapena, thus I mention him if he wants to see the fix in retrospective.

@michaeljmarshall michaeljmarshall merged commit aae5c62 into main Sep 22, 2025
494 checks passed
@michaeljmarshall michaeljmarshall deleted the cndb-15423 branch September 22, 2025 16:12
michaelsembwever pushed a commit that referenced this pull request Sep 25, 2025
…2007)

Fixes: riptano/cndb#15423
CNDB test: riptano/cndb#15427

This PR fixes a subtle bug exposed by the
`FeaturesVersionSupportCATest.testANN` test, as well as other similarly
named `FeaturesVersionSupport*Test` tests. The central issue is whether
to create the `queryVector` object. The currently logic does so only
when `useSyntheticScore()` is false. However, if a peer has it set to
false and this node has it on true, we'll end up with an easily avoided
NPE. I propose instead that we lazily load the vector when it is needed.
This removes an unnecessary switch as well as removing an unnecessary
ordering requirement on upgrades.

```
java.lang.NullPointerException
	at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85)
	at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145)
	at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233)
	at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566)
	at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60)
```
michaelsembwever pushed a commit that referenced this pull request Sep 26, 2025
…2007)

Fixes: riptano/cndb#15423
CNDB test: riptano/cndb#15427

This PR fixes a subtle bug exposed by the
`FeaturesVersionSupportCATest.testANN` test, as well as other similarly
named `FeaturesVersionSupport*Test` tests. The central issue is whether
to create the `queryVector` object. The currently logic does so only
when `useSyntheticScore()` is false. However, if a peer has it set to
false and this node has it on true, we'll end up with an easily avoided
NPE. I propose instead that we lazily load the vector when it is needed.
This removes an unnecessary switch as well as removing an unnecessary
ordering requirement on upgrades.

```
java.lang.NullPointerException
	at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85)
	at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145)
	at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233)
	at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566)
	at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60)
```
michaelsembwever pushed a commit that referenced this pull request Feb 6, 2026
…2007)

Fixes: riptano/cndb#15423
CNDB test: riptano/cndb#15427

This PR fixes a subtle bug exposed by the
`FeaturesVersionSupportCATest.testANN` test, as well as other similarly
named `FeaturesVersionSupport*Test` tests. The central issue is whether
to create the `queryVector` object. The currently logic does so only
when `useSyntheticScore()` is false. However, if a peer has it set to
false and this node has it on true, we'll end up with an easily avoided
NPE. I propose instead that we lazily load the vector when it is needed.
This removes an unnecessary switch as well as removing an unnecessary
ordering requirement on upgrades.

```
java.lang.NullPointerException
	at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85)
	at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145)
	at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233)
	at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566)
	at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60)
```
michaelsembwever pushed a commit that referenced this pull request Feb 10, 2026
…2007)

Fixes: riptano/cndb#15423
CNDB test: riptano/cndb#15427

This PR fixes a subtle bug exposed by the
`FeaturesVersionSupportCATest.testANN` test, as well as other similarly
named `FeaturesVersionSupport*Test` tests. The central issue is whether
to create the `queryVector` object. The currently logic does so only
when `useSyntheticScore()` is false. However, if a peer has it set to
false and this node has it on true, we'll end up with an easily avoided
NPE. I propose instead that we lazily load the vector when it is needed.
This removes an unnecessary switch as well as removing an unnecessary
ordering requirement on upgrades.

```
java.lang.NullPointerException
	at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85)
	at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145)
	at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233)
	at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566)
	at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60)
```

 (Rebase of commit be64177)
michaelsembwever pushed a commit that referenced this pull request Feb 11, 2026
…2007)

Fixes: riptano/cndb#15423
CNDB test: riptano/cndb#15427

This PR fixes a subtle bug exposed by the
`FeaturesVersionSupportCATest.testANN` test, as well as other similarly
named `FeaturesVersionSupport*Test` tests. The central issue is whether
to create the `queryVector` object. The currently logic does so only
when `useSyntheticScore()` is false. However, if a peer has it set to
false and this node has it on true, we'll end up with an easily avoided
NPE. I propose instead that we lazily load the vector when it is needed.
This removes an unnecessary switch as well as removing an unnecessary
ordering requirement on upgrades.

```
java.lang.NullPointerException
	at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85)
	at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145)
	at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233)
	at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566)
	at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60)
```

 (Rebase of commit be64177)
michaelsembwever pushed a commit that referenced this pull request Feb 12, 2026
…2007)

Fixes: riptano/cndb#15423
CNDB test: riptano/cndb#15427

This PR fixes a subtle bug exposed by the
`FeaturesVersionSupportCATest.testANN` test, as well as other similarly
named `FeaturesVersionSupport*Test` tests. The central issue is whether
to create the `queryVector` object. The currently logic does so only
when `useSyntheticScore()` is false. However, if a peer has it set to
false and this node has it on true, we'll end up with an easily avoided
NPE. I propose instead that we lazily load the vector when it is needed.
This removes an unnecessary switch as well as removing an unnecessary
ordering requirement on upgrades.

```
java.lang.NullPointerException
	at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85)
	at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145)
	at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233)
	at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566)
	at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60)
```

 (Rebase of commit be64177)
michaelsembwever pushed a commit that referenced this pull request Feb 14, 2026
…2007)

Fixes: riptano/cndb#15423
CNDB test: riptano/cndb#15427

This PR fixes a subtle bug exposed by the
`FeaturesVersionSupportCATest.testANN` test, as well as other similarly
named `FeaturesVersionSupport*Test` tests. The central issue is whether
to create the `queryVector` object. The currently logic does so only
when `useSyntheticScore()` is false. However, if a peer has it set to
false and this node has it on true, we'll end up with an easily avoided
NPE. I propose instead that we lazily load the vector when it is needed.
This removes an unnecessary switch as well as removing an unnecessary
ordering requirement on upgrades.

```
java.lang.NullPointerException
	at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85)
	at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145)
	at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233)
	at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566)
	at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60)
```

 (Rebase of commit be64177)
michaelsembwever pushed a commit that referenced this pull request Feb 16, 2026
…2007)

Fixes: riptano/cndb#15423
CNDB test: riptano/cndb#15427

This PR fixes a subtle bug exposed by the
`FeaturesVersionSupportCATest.testANN` test, as well as other similarly
named `FeaturesVersionSupport*Test` tests. The central issue is whether
to create the `queryVector` object. The currently logic does so only
when `useSyntheticScore()` is false. However, if a peer has it set to
false and this node has it on true, we'll end up with an easily avoided
NPE. I propose instead that we lazily load the vector when it is needed.
This removes an unnecessary switch as well as removing an unnecessary
ordering requirement on upgrades.

```
java.lang.NullPointerException
	at io.github.jbellis.jvector.vector.VectorUtil.squareL2Distance(VectorUtil.java:85)
	at io.github.jbellis.jvector.vector.VectorSimilarityFunction$1.compare(VectorSimilarityFunction.java:40)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.getScoreForRow(TopKProcessor.java:333)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.processScoredPartition(TopKProcessor.java:276)
	at org.apache.cassandra.index.sai.plan.TopKProcessor.reorder(TopKProcessor.java:145)
	at org.apache.cassandra.index.sai.plan.StorageAttachedIndexQueryPlan.lambda$postProcessor$2(StorageAttachedIndexQueryPlan.java:233)
	at org.apache.cassandra.db.ReadCommand.postReconciliationProcessing(ReadCommand.java:566)
	at org.apache.cassandra.service.reads.range.RangeCommands.partitions(RangeCommands.java:60)
```

 (Rebase of commit be64177)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants