Skip to content

Update dense_vector docs with kNN indexing options #80306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 4, 2021

Conversation

jtibshirani
Copy link
Contributor

This commit updates the dense_vector docs to include information on the new
index, similarity, and index_options parameters. It also tries to clarify
the difference between similarity and index_options with the existing
parameters that have the same name.

Relates to #78473.

@jtibshirani jtibshirani added >docs General docs changes :Search/Search Search-related issues that do not fall into other categories v8.0.0 v8.1.0 labels Nov 3, 2021
@elasticmachine elasticmachine added Team:Search Meta label for search team Team:Docs Meta label for docs team labels Nov 3, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)


[discrete]
[[dense-vector-params]]
==== Parameters for dense vector fields
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled with the formatting a bit here -- any corrections/ suggestions are appreciated. The other field types use the [horizontal] styling, but I couldn't get this to work with the sublists and code snippets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update the other field types to remove the [horizontal] attribute and use formatting similar to the parameter definitions in our API docs.

I left a few comments to:

  • Add collapsible sections for nested value/properties.
  • Add required and data type for each parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I went through and accepted all of these (batched into a new commit).

@jtibshirani jtibshirani mentioned this pull request Nov 3, 2021
17 tasks
@elastic elastic deleted a comment from jtibshirani Nov 3, 2021
only be accessed in scripts through the dedicated <<vector-functions,vector functions>>.
You can use `dense_vector` fields in
<<query-dsl-script-score-query,`script_score`>> queries to score documents.
They can also be indexed to support efficient k-nearest neighbor search. Dense
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k-nearest neighbor search
Note from @jtibshirani: Link this to the knn search API docs when they're up (in a follow-up PR).

Sorry for accidentally deleting this. I was deleting some of my own comments from a pending review and accidentally clicked the wrong one. 🤦

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

efficient k-nearest neighbor search

The link will help, but this feels a little cryptic to me. Should we mention the kNN search API directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's unclear right now. I'll improve this in the follow-up PR where I document the kNN search API.

@jrodewig jrodewig self-requested a review November 4, 2021 01:03
Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtibshirani Thanks, great addition!

The type of kNN algorithm to use. Currently only `hnsw` is supported.

`m`:::
Controls how many of the nearest neighbor candidates are connected to each new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be better say, the maximum number of neighbour connections each node can have in the HNSW graph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I reworded this to make it clearer. But I didn't say "maximum" because this is the actual number of connections that the algorithm tries to create (not an upper bound).

Copy link
Contributor

@mayya-sharipova mayya-sharipova Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation was that it is an upper bound, in the sense that we don't allow to have more neighbours than m, and will prune the worst to have the number of neighbours always under m.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, I interpreted "maximum" to mean that it's possible many nodes have fewer than m neighbors. But the algorithm attempts to have each node have exactly m neighbors.

In any case, hopefully the current wording is clear :) It matches the description from hnswlib: "M - the number of bi-directional links created for every new element during construction".

Copy link
Contributor

@jrodewig jrodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jtibshirani. I mostly left nits and formatting fixes.

My only larger feedback was around how we talk about script scoring vs. the kNN search API. I wonder if we can be more direct in contrasting these methods. However, I don't think that's blocking. I can address it more in the high-level guide if wanted.


[discrete]
[[dense-vector-params]]
==== Parameters for dense vector fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should update the other field types to remove the [horizontal] attribute and use formatting similar to the parameter definitions in our API docs.

I left a few comments to:

  • Add collapsible sections for nested value/properties.
  • Add required and data type for each parameter.

jtibshirani and others added 3 commits November 4, 2021 10:23
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
@jtibshirani
Copy link
Contributor Author

@mayya-sharipova @jrodewig thank you both for the reviews! It looks much more readable now.

@jtibshirani jtibshirani merged commit 075d08e into elastic:master Nov 4, 2021
@jtibshirani jtibshirani deleted the knn-docs branch November 4, 2021 18:44
jtibshirani added a commit that referenced this pull request Nov 4, 2021
This commit updates the `dense_vector` docs to include information on the new
`index`, `similarity`, and `index_options` parameters. It also tries to clarify
the difference between `similarity` and `index_options` with the existing
parameters that have the same name.

Relates to #78473.
@jtibshirani jtibshirani added :Search Relevance/Vectors Vector search and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :Search Relevance/Vectors Vector search Team:Docs Meta label for docs team Team:Search Meta label for search team v8.0.0-rc1 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants