CNDB-16051: Include pq vectors in CompactionGraph bytesUsed calculation#2144
CNDB-16051: Include pq vectors in CompactionGraph bytesUsed calculation#2144michaeljmarshall merged 4 commits intomainfrom
Conversation
Checklist before you submit for review
|
|
| // Set force PQ training size to ensure we hit the refine code path and apply it to half the vectors. | ||
| // TODO this test fails as of this commit due to recall issues. Will investigate further. | ||
| CompactionGraph.PQ_TRAINING_SIZE = baseVectors.size() / 2; |
There was a problem hiding this comment.
The problem here seems likely to be with GraphIndexBuilder.rescore in the CompactionGraph. The failure is unrelated to this PR. I am looking into possible fixes. Generally speaking, I think we can entirely remove the rescore logic, which duplicates work unnecessarily. We could instead accumulate N vectors, refine a PQ, encode the vectors, and then insert them into a graph builder properly built using these PQ values. I have a local version of this that passes, but is a bit hacky.
There was a problem hiding this comment.
Can you explain how is this added test here related to the memory consumption tracking modification in the code under test? I can see we're likely invoking vector compression here, but we're not testing the memory consumption counters. Or am I missing sth?
There was a problem hiding this comment.
By reducing the PQ_TRAINING_SIZE, we hit code that was previously uncovered and then we asserted on the recall for the graph built by that code and the test didn't pass. The failure is unrelated to my change.
pkolaczk
left a comment
There was a problem hiding this comment.
The code change looks good.
The failing test - let's move it to a separate issue.
The code change is simple enough that we can live with it not covered for a bit longer (it wasn't covered anyway before).
| for (int topK : List.of(1, 100)) | ||
| { | ||
| var recall = testRecall(topK, queryVectors, groundTruth); | ||
| assertTrue("Post-compaction recall is " + recall, recall > postCompactionRecall); |
There was a problem hiding this comment.
If this is failing, maybe relax this condition for a while, instead of removing the whole test.
This way you still invoke the memory tracking code modification added in this PR.
BTW: is there a way to assert for the memory counting somehow?
There was a problem hiding this comment.
If this is failing, maybe relax this condition for a while, instead of removing the whole test.
I removed the assertion entirely since it isn't valuable assertion > 0.
BTW: is there a way to assert for the memory counting somehow?
We do have ways of doing that in the project, but I'm not sure if it is worth adding just yet. The current accounting is not exact (see #2002). The main point of this PR is to make sure we track one of the largest contributors to heap memory consumption.
As a follow up, I create datastax/jvector#614 to see if jvector can return the bytes added as a result of the invocation.
❌ Build ds-cassandra-pr-gate/PR-2144 rejected by Butler5 regressions found Found 5 new test failures
Found 2 known test failures |
|
|
Test failures are all compression related time outs. Code coverage is sufficiently high, merging. |



What is the issue
Fixes: https://github.com/riptano/cndb/issues/16051
CNDB PR: https://github.com/riptano/cndb/pull/16675
What does this PR fix and why was it fixed
The SAI segment builder logic keeps track of bytes used by a segment builder to ensure proper flushing to prevent OOM. This change tracks the PQ and BQ vector byte utilization per insertion.