Skip to content

DOC-738 | Vector index reference docs #700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

DOC-738 | Vector index reference docs #700

wants to merge 5 commits into from

Conversation

Simran-B
Copy link
Contributor

Description

Upstream PRs

  • 3.10:
  • 3.11:
  • 3.12:
  • 3.13:

@Simran-B Simran-B self-assigned this May 15, 2025
Copy link
Contributor

Deploy Preview Available Via
https://deploy-preview-700--docs-hugo.netlify.app

@cla-bot cla-bot bot added the cla-signed label May 15, 2025
@ansoboleva ansoboleva added this to the 3.12.5 milestone May 21, 2025
@Simran-B Simran-B requested a review from jbajic June 5, 2025 15:43
Comment on lines +68 to +71
The closer the cosine similarity value is to 1, the more similar the two vectors
are. The closer it is to 0, the more different they are. The value can also
be up to -1, indicating that the vectors are not similar and point in opposite
directions. You need to sort in descending order so that the most similar
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably have to revise this and potentially only say that higher means more similar?

@Simran-B Simran-B marked this pull request as ready for review June 5, 2025 15:44
```

Return the similarity value and the documents of up to `5` close neighbors,
considering `20` neighboring centroids:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is wrong, but the official terminology is a Voronoi cell, and the Voronoi cell is defined via the centroid. Therefore, we should consider the documents from the top 20 closest Voronoi cells, which are defined by their centroid. I am fine with leaving it like it is just wanted to note this.

The number of centroids in the index. What value to choose
depends on the data distribution and chosen metric. According to
[The Faiss library paper](https://arxiv.org/abs/2401.08281), it should be
around `N / 15` where `N` is the number of documents in the collection,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
around `N / 15` where `N` is the number of documents in the collection,
around `sqrt(N) / 15` where `N` is the number of documents in the collection,

type: integer
default: 25
factory:
description: |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe to just add that if not specified, the default IVF will be used with FlatL2 if l2 metric or Flat if the cosine metric. That is implicit. And only the IVF indexes are supported, with other combinations, meaning you cannot create just HNSW_5 index

- **nLists** (number): The number of centroids in the index. What value to choose
depends on the data distribution and chosen metric. According to
[The Faiss library paper](https://arxiv.org/abs/2401.08281), it should be
around `N / 15` where `N` is the number of documents in the collection,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
around `N / 15` where `N` is the number of documents in the collection,
around `sqrt(N) / 15` where `N` is the number of documents in the collection,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants