-
Notifications
You must be signed in to change notification settings - Fork 8
DOC-738 | Vector index reference docs #700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Deploy Preview Available Via |
The closer the cosine similarity value is to 1, the more similar the two vectors | ||
are. The closer it is to 0, the more different they are. The value can also | ||
be up to -1, indicating that the vectors are not similar and point in opposite | ||
directions. You need to sort in descending order so that the most similar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably have to revise this and potentially only say that higher means more similar?
``` | ||
|
||
Return the similarity value and the documents of up to `5` close neighbors, | ||
considering `20` neighboring centroids: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is wrong, but the official terminology is a Voronoi cell, and the Voronoi cell is defined via the centroid. Therefore, we should consider the documents from the top 20 closest Voronoi cells, which are defined by their centroid. I am fine with leaving it like it is just wanted to note this.
The number of centroids in the index. What value to choose | ||
depends on the data distribution and chosen metric. According to | ||
[The Faiss library paper](https://arxiv.org/abs/2401.08281), it should be | ||
around `N / 15` where `N` is the number of documents in the collection, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
around `N / 15` where `N` is the number of documents in the collection, | |
around `sqrt(N) / 15` where `N` is the number of documents in the collection, |
type: integer | ||
default: 25 | ||
factory: | ||
description: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe to just add that if not specified, the default IVF will be used with FlatL2 if l2 metric or Flat if the cosine metric. That is implicit. And only the IVF indexes are supported, with other combinations, meaning you cannot create just HNSW_5
index
- **nLists** (number): The number of centroids in the index. What value to choose | ||
depends on the data distribution and chosen metric. According to | ||
[The Faiss library paper](https://arxiv.org/abs/2401.08281), it should be | ||
around `N / 15` where `N` is the number of documents in the collection, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
around `N / 15` where `N` is the number of documents in the collection, | |
around `sqrt(N) / 15` where `N` is the number of documents in the collection, |
Description
Upstream PRs