-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Update dense_vector
docs with kNN indexing options
#80306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-docs (Team:Docs) |
Pinging @elastic/es-search (Team:Search) |
|
||
[discrete] | ||
[[dense-vector-params]] | ||
==== Parameters for dense vector fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled with the formatting a bit here -- any corrections/ suggestions are appreciated. The other field types use the [horizontal]
styling, but I couldn't get this to work with the sublists and code snippets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update the other field types to remove the [horizontal]
attribute and use formatting similar to the parameter definitions in our API docs.
I left a few comments to:
- Add collapsible sections for nested value/properties.
- Add required and data type for each parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I went through and accepted all of these (batched into a new commit).
only be accessed in scripts through the dedicated <<vector-functions,vector functions>>. | ||
You can use `dense_vector` fields in | ||
<<query-dsl-script-score-query,`script_score`>> queries to score documents. | ||
They can also be indexed to support efficient k-nearest neighbor search. Dense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k-nearest neighbor search
Note from @jtibshirani: Link this to the knn search API docs when they're up (in a follow-up PR).
Sorry for accidentally deleting this. I was deleting some of my own comments from a pending review and accidentally clicked the wrong one. 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
efficient k-nearest neighbor search
The link will help, but this feels a little cryptic to me. Should we mention the kNN search API directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's unclear right now. I'll improve this in the follow-up PR where I document the kNN search API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtibshirani Thanks, great addition!
The type of kNN algorithm to use. Currently only `hnsw` is supported. | ||
|
||
`m`::: | ||
Controls how many of the nearest neighbor candidates are connected to each new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be better say, the maximum number of neighbour connections each node can have in the HNSW graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I reworded this to make it clearer. But I didn't say "maximum" because this is the actual number of connections that the algorithm tries to create (not an upper bound).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My interpretation was that it is an upper bound, in the sense that we don't allow to have more neighbours than m
, and will prune the worst to have the number of neighbours always under m
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, I interpreted "maximum" to mean that it's possible many nodes have fewer than m
neighbors. But the algorithm attempts to have each node have exactly m
neighbors.
In any case, hopefully the current wording is clear :) It matches the description from hnswlib: "M - the number of bi-directional links created for every new element during construction".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jtibshirani. I mostly left nits and formatting fixes.
My only larger feedback was around how we talk about script scoring vs. the kNN search API. I wonder if we can be more direct in contrasting these methods. However, I don't think that's blocking. I can address it more in the high-level guide if wanted.
|
||
[discrete] | ||
[[dense-vector-params]] | ||
==== Parameters for dense vector fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update the other field types to remove the [horizontal]
attribute and use formatting similar to the parameter definitions in our API docs.
I left a few comments to:
- Add collapsible sections for nested value/properties.
- Add required and data type for each parameter.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
@mayya-sharipova @jrodewig thank you both for the reviews! It looks much more readable now. |
This commit updates the `dense_vector` docs to include information on the new `index`, `similarity`, and `index_options` parameters. It also tries to clarify the difference between `similarity` and `index_options` with the existing parameters that have the same name. Relates to #78473.
This commit updates the
dense_vector
docs to include information on the newindex
,similarity
, andindex_options
parameters. It also tries to clarifythe difference between
similarity
andindex_options
with the existingparameters that have the same name.
Relates to #78473.