Skip to content

Potential issues with rejections and starvation in the Inference Processor #103665

Open
@jimczi

Description

@jimczi

Description

Today the inference processor handles documents in a bulk request in parallel due to its async implementation.
With a default queue size of 1024 in the trained model API, it is fairly easy to hit the limit in a single bulk request or when using multiple bulks in parallel. The write queue size in ES has a default of 10k so most of our language clients set a default bulk size comprised between 500 and 1000 which is fine when there's a single bulk request at a time. However for small documents, it is fairly common to set the limit based on the size in bytes of the bulk request which in turn can lead to bulk requests having more than 1024 documents.
I am opening this issue to discuss what we should recommend for our users since sending all documents in a bulk request to perform inference is not very efficient and leads to rejections and/or timeouts.
I am also not sure how timeout/cancellation is handled but in such case it seems that the requests are not removed from the queue so we're performing inference on requests that the inference processor is not awaiting.

One possible mitigation could be to raise the default queue size to match the write queue size (10k) and/or to document the fact that bulk size should be reduced when using an inference processor. Although the timeout doesn't take the size of the bulk request into account so even with a bigger queue size, hitting a timeout is more likely to happen on big bulk request.
Another possibility could be to throttle the requests in the inference processor in order to give some room for the inflight requests to finish before sending another batch. The enrich processor batches requests in the background so we might want to build something similar.
The ingest pipeline is already a form of queue so sending all requests at once is not strictly necessary.
It makes the setup of bulk request quite difficult for users that are not aware of these limitations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions