Skip to content

Can we remove compress option for quantized KNN vector indexing? #13768

Open
@mikemccand

Description

@mikemccand

Description

Spinoff from this comment.

This (compress=true) is a useful option when quantizing KNN vectors to 4 bits: it packs pairs of dimensions into a single byte, so the "hot working set" of your KNN/HNSW vectors at search time is half the already reduced (from float32 -> byte) size. When compress is false then it's wasteful, using only four bits for every byte.

But it comes with some penalty to decode the "packed" (compress=true) form during KNN search, which is why we give this choice to the user.

But then I think there was at least one opto to that path, so maybe the performance penalty isn't so bad now? In which case maybe we can just always hardwire compress=true when quantized bits=4?

(compress=true doesn't apply to 7 bit quantization)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions