Skip to content

Add option to include or exclude vectors from _source retrieval #128735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Jun 2, 2025

This PR introduces a new include_vectors option to the _source retrieval context. When set to false, vectors are excluded from the returned _source. This is especially efficient when used with synthetic source, as it avoids loading vector fields entirely.

By default, vectors remain included unless explicitly excluded:

POST my_index/_search
{
  "_source": {
    "include_vectors": false
  }
}

NOTE: docs are missing and will be added in a follow up.

This PR introduces a new include_vectors option to the _source retrieval context.
When set to false, vectors are excluded from the returned _source.
This is especially efficient when used with synthetic source, as it avoids loading vector fields entirely.

By default, vectors remain included unless explicitly excluded.
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jun 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Jun 2, 2025

@jimczi Instead of introducing a new param, why can't we use the existing API for source filtering:

POST my_index/_search
{
  "_source": {
    "excludes": [ "my_vector_field" ]
  }...
}

@benwtrent
Copy link
Member

@mayya-sharipova the idea is that you have one parameter to exclude all vector fields, by their type, instead of providing each unique vector field name.

I also suppose this unlocks future consideration of having an index level default for _source inclusion at query time type of setting that is applied by default for all queries.

But, your comment @mayya-sharipova makes me wonder if we should do something like:

{
  "_source": {
    "mapping_type_excludes": [ "dense_vector", "sparse_vector" ]
  }...
}

Instead of having something called include_vectors.

@jimczi would we add some index level setting that applies this default source filtering at query time? Is that the ultimate goal here?

@jimczi
Copy link
Contributor Author

jimczi commented Jun 2, 2025

I limited the feature to vector fields because the main goal is to apply source filtering early, during _source loading, unlike the current approach where includes/excludes act as a post-processing step.

The downside of early filtering is that any filtered fields become unavailable to sub-fetch phases such as highlighting or field retrieval. However, this isn't a concern for vector fields, since relying on their _source content in these phases is discouraged, we typically load them from doc values instead.

would we add some index level setting that applies this default source filtering at query time? Is that the ultimate goal here?

That’s one potential direction. For now, my main goal is to provide a fast and simple way for search requests to exclude vectors from the response. Defining how to make this behavior the default for certain indices is still an open question, but this PR keeps that door open, which I think is a good thing.

.values()
.stream()
.filter(
f -> f instanceof DenseVectorFieldMapper.DenseVectorFieldType || f instanceof SparseVectorFieldMapper.SparseVectorFieldType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to me that this should be a general "mapping.type" exclusion. I realize this complicates things with plugin mapper fields (rank_vector...), but rank_vectors should also be included here...

index: test
body:
_source:
include_vectors: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work with other exclusions/inclusions?

Like, the user says "include_vectors: false", but specifically asks for a vector field inclusion?

Or vice versa?

What about fields other than vectors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I guess we can address this in documentation, saying that "include_vectors: false" has precedence over other conflicting options.

.values()
.stream()
.filter(
f -> f instanceof DenseVectorFieldMapper.DenseVectorFieldType || f instanceof SparseVectorFieldMapper.SparseVectorFieldType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think FieldType should add an interface called isVector or something, and we can utilize that (satisfying the interface for rank_vector) and it allows us to simplify this and apply teh appropriate filtering.

return null;
}
var lookup = context.getSearchExecutionContext().getMappingLookup();
List<String> inferencePatterns = lookup.inferenceFields().isEmpty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth to include all vectorFields() as into the lookup similar to inferenceFields

body:
_source:
include_vectors: false
sort: ["name"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check that vectors still accessible through fields?

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi Thanks, the source loading completely makes sense to me.

I was also convinced since we don't need it for other types, having it specifically for vectors makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants