Skip to content

Support cosine similarity in kNN search #79500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 21, 2021

Conversation

jtibshirani
Copy link
Contributor

This PR adds support for cosine similarity:

"mappings": {
  "properties": {
    "my_vector": {
      "type": "dense_vector",
      "dims": 128,
      "index": true,
      "similarity": "cosine"
    }
  }
}

Unlike dot_product, which requires vectors to be of unit length, this
similarity can handle vectors with any magnitude.

This PR also adds validation around dot_product to help catch mistakes. When
indexing vectors, we double-check that each vector has unit length. We also
check that kNN query vectors have unit length.

@jtibshirani jtibshirani added :Search/Search Search-related issues that do not fall into other categories v8.0.0 labels Oct 19, 2021
@jtibshirani jtibshirani mentioned this pull request Oct 19, 2021
17 tasks
@jtibshirani
Copy link
Contributor Author

I ran benchmarks and confirmed the new validation for dot_product does not significantly affect performance.

@jtibshirani jtibshirani marked this pull request as ready for review October 19, 2021 17:53
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 19, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@mayya-sharipova mayya-sharipova self-requested a review October 20, 2021 01:38
Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtibshirani Thanks, LGTM!

@jtibshirani jtibshirani merged commit 7f01138 into elastic:master Oct 21, 2021
@jtibshirani jtibshirani deleted the cosine-similarity branch October 21, 2021 21:43
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Oct 28, 2021
This PR adds support for `cosine` similarity:

```
"mappings": {
  "properties": {
    "my_vector": {
      "type": "dense_vector",
      "dims": 128,
      "index": true,
      "similarity": "cosine"
    }
  }
}
```

Unlike `dot_product`, which requires vectors to be of unit length, this
similarity can handle vectors with any magnitude.

This PR also adds validation around `dot_product` to help catch mistakes. When
indexing vectors, we double-check that each vector has unit length. We also
check that kNN query vectors have unit length.
@jtibshirani jtibshirani added :Search Relevance/Vectors Vector search and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants