Skip to content

Introduce vector field, vector query and rescoring based on them #31615

Closed
@mayya-sharipova

Description

@mayya-sharipova

Introduce a new field of type vector on which vector calculations can be done during rescoring phase

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "my_feature": {
          "type": "vector"   
      }
    }
  }
}

Indexing

Allow only a single value per document
Allow to index both dense and sparse vectors?

Dense form:

PUT my_index/_doc/1
{
  "my_feature":   [11.5, 10.4, 23.0]
}

Sparse form (represented as list of dimension names and values for corresponding dimensions):

PUT my_index/_doc/1
{
  "my_feature": {"1": 11.5, "5": 10.5,  "101": 23.0}
}

Query and Rescoring

Introduce a special type of vector query:

"vector" : {
   "field" : "my_feature",
    "query_vector": {"1": 3, "5": 10.5,  "101": 12}
}

This query can only be used in the rescoring context.
This query produces a score for every document in the rescoring context in the following way:

  1. If a document doesn't have a vector value for field, 0 value will be returned
  2. If a document does have a vector value for field : doc_vector, the cosine similarity between doc_vector and query_vector is calculated:
    dotProduct(doc_vector, query_vector) / (sqrt(doc_vector) * sqrt(query_vector))
POST /_search
{
   "query" : {"<user-query>"},
   "rescore" : {
      "window_size" : 50,
      "query" : {
         "rescore_query" : {
            "vector" : {
               "field" : "my_feature",
               "query_vector": {"1": 3, "5": 10.5,  "101": 12}
            }
         }
      }
   }
}

Internal encoding

  1. Encoding of vectors:
    Internally both dense and sparse vectors are encoded as sorted hash?
    Thus dense array is transformed:
    [4, 12] -> {0: 4, 1: 12}
    Keys are sorted, so we can iterate over them instead of calculating hash

  2. What should be values in vectors?

    • floats?
    • smaller than floats? (lost some precision here, but less index size)
  3. Vectors are encoded as binaries.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions