Open
Description
Description
Spinoff from this comment.
This (compress=true
) is a useful option when quantizing KNN vectors to 4 bits: it packs pairs of dimensions into a single byte, so the "hot working set" of your KNN/HNSW vectors at search time is half the already reduced (from float32
-> byte
) size. When compress
is false
then it's wasteful, using only four bits for every byte.
But it comes with some penalty to decode the "packed" (compress=true
) form during KNN search, which is why we give this choice to the user.
But then I think there was at least one opto to that path, so maybe the performance penalty isn't so bad now? In which case maybe we can just always hardwire compress=true
when quantized bits=4
?
(compress=true
doesn't apply to 7 bit quantization)