-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add option to include or exclude vectors from _source retrieval #128735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR introduces a new include_vectors option to the _source retrieval context. When set to false, vectors are excluded from the returned _source. This is especially efficient when used with synthetic source, as it avoids loading vector fields entirely. By default, vectors remain included unless explicitly excluded.
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Hi @jimczi, I've created a changelog YAML for you. |
…nto include_vectors_source_option
@jimczi Instead of introducing a new param, why can't we use the existing API for source filtering: POST my_index/_search
{
"_source": {
"excludes": [ "my_vector_field" ]
}...
} |
@mayya-sharipova the idea is that you have one parameter to exclude all vector fields, by their type, instead of providing each unique vector field name. I also suppose this unlocks future consideration of having an index level default for But, your comment @mayya-sharipova makes me wonder if we should do something like:
Instead of having something called @jimczi would we add some index level setting that applies this default source filtering at query time? Is that the ultimate goal here? |
I limited the feature to vector fields because the main goal is to apply source filtering early, during The downside of early filtering is that any filtered fields become unavailable to sub-fetch phases such as highlighting or field retrieval. However, this isn't a concern for vector fields, since relying on their
That’s one potential direction. For now, my main goal is to provide a fast and simple way for search requests to exclude vectors from the response. Defining how to make this behavior the default for certain indices is still an open question, but this PR keeps that door open, which I think is a good thing. |
.values() | ||
.stream() | ||
.filter( | ||
f -> f instanceof DenseVectorFieldMapper.DenseVectorFieldType || f instanceof SparseVectorFieldMapper.SparseVectorFieldType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to me that this should be a general "mapping.type" exclusion. I realize this complicates things with plugin mapper fields (rank_vector...), but rank_vectors should also be included here...
index: test | ||
body: | ||
_source: | ||
include_vectors: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work with other exclusions/inclusions?
Like, the user says "include_vectors: false", but specifically asks for a vector field inclusion?
Or vice versa?
What about fields other than vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I guess we can address this in documentation, saying that "include_vectors: false" has precedence over other conflicting options.
.values() | ||
.stream() | ||
.filter( | ||
f -> f instanceof DenseVectorFieldMapper.DenseVectorFieldType || f instanceof SparseVectorFieldMapper.SparseVectorFieldType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think FieldType
should add an interface called isVector
or something, and we can utilize that (satisfying the interface for rank_vector) and it allows us to simplify this and apply teh appropriate filtering.
return null; | ||
} | ||
var lookup = context.getSearchExecutionContext().getMappingLookup(); | ||
List<String> inferencePatterns = lookup.inferenceFields().isEmpty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth to include all vectorFields()
as into the lookup similar to inferenceFields
body: | ||
_source: | ||
include_vectors: false | ||
sort: ["name"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check that vectors still accessible through fields
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimczi Thanks, the source loading completely makes sense to me.
I was also convinced since we don't need it for other types, having it specifically for vectors makes sense.
This PR introduces a new
include_vectors
option to the_source
retrieval context. When set to false, vectors are excluded from the returned _source. This is especially efficient when used with synthetic source, as it avoids loading vector fields entirely.By default, vectors remain included unless explicitly excluded:
NOTE: docs are missing and will be added in a follow up.