Skip to content

Add a field type for high-dimensional bit vectors. #48322

Closed
@jtibshirani

Description

@jtibshirani

The dense_vector type helps users work with vector 'embeddings' of unstructured data like text and images. This issue proposes to add a new 'bit vector' type and 'hamming distance' script function as part of supporting this use case.

Dense vector fields allow for storing float vectors. For images, it also seems common to use bit vectors:

There has also been recent work on converting traditional text embeddings to bit vectors, for example Learning Compressed Sentence Representations for On-Device Text Processing.

Compared to using a dense_vector to represent the binary vectors, a dedicated 'bit vector' type would require less space and could support faster distance computations. Looking forward, it may also be possible to support retrieval based on bit vector distance through a specialized strategy (distinct from what we've considered for float vectors in #42326).

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions