Skip to content

Refine indexing pressure accounting in semantic bulk inference filter #129320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Jun 12, 2025

In #125517, we estimated that inference results would double the document _source size since they are pooled by the bulk action.
This PR reduces the memory needed to perform the update by reusing the original source array when possible. This way we can only account for the extra inference fields and reduce the overall indexing pressure.

Additionally, this PR introduces a new counter in InferenceStats to track the number of rejections caused by indexing pressure from inference results.

Previously, we conservatively estimated that inference results would double the document _source size.
This can led to unnecessary circuit breaker exceptions, even when the node had sufficient memory to handle the operation.
This PR replaces the rough estimation with the actual size of the _source after the update. Since inference calls use a batch size of 1 MB, we rely on the real circuit breaker to ensure that results fit in memory before applying indexing pressure accounting.
Additionally, this PR introduces a new counter in InferenceStats to track the number of rejections caused by indexing pressure from inference results.
@jimczi jimczi requested a review from Mikep86 June 12, 2025 09:40
@jimczi jimczi added >enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.19.0 v9.1.0 labels Jun 12, 2025
@elasticsearchmachine elasticsearchmachine added Team:SearchOrg Meta label for the Search Org (Enterprise Search) Team:Search - Relevance The Search organization Search Relevance team labels Jun 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging this PR, I would like to examine the assumption that doubling document source size for indexing pressure purposes is incorrect. This was done during development of #125517 because the original source is pooled and the memory for it is released only after bulk request handling is complete. We accounted for this by adding indexing memory pressure for the additional copy of source generated to insert embeddings into. What new information do we have now that allows us to change this approach?

try {
coordinatingIndexingPressure.increment(0, modifiedSourceSize - originalSource.ramBytesUsed());
coordinatingIndexingPressure.increment(1, modifiedSourceSize - originalSource.ramBytesUsed());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this approach, we are accounting only for the size of the embeddings (+ field data structures around them) in source. We are not accounting for the extra copy of source that we generate to add the embeddings to. Per our discussions when working on #125517, the original source (i.e. the source without embeddings) is pooled, so overwriting it in the index request does not free that memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder, I confused this with our earlier discussion about considering the input text for inference and mistakenly thought this was part of our solution. I’ll revert this change and just add the counter and the new message so we can better track the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a comment about it, it is certainly a non-obvious detail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most def, on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed c85718d to copy the source in place when possible. This way we don't reserve the bytes during the inference calls since we don't use them yet and the request might take some time to finish.
WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team Team:SearchOrg Meta label for the Search Org (Enterprise Search) v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants