QP request batcher #13691

skonto · 2023-02-13T16:05:10Z

/area API

Describe the feature

Requirements

User should be able to specify certain criteria like latency and batch size per service so that QP can decide when to submit the batch to the user container. This would require to annotate a specific service accordingly.
Each request of the batch is handled independently and transparently. In order to achieve this a protocol should be defined between QP and the user container so that user container should comply with it in order to receive data in batch mode.
This does not cover the case where requests need to be sent as a batch across the whole Knative data plane
The feature will be an extension, not enabled by default

Use cases

There are scenarios where http requests need to be delivered as a batch instead of one by one. A common scenario is model serving where you get better performance if requests are collected as a batch from the backend in order to apply an operation per data vector and not per data instance. An implementation for Knative Serving that uses an intermediate container can be found in KServe here.
As discussed here users coming from other systems such as AWS SQS might expect a batch configuration option to consume more than one requests at least at the user container/backend side.

dprotaso · 2023-03-15T17:29:10Z

I'm not sure if it's possible to generalize a batching API - it seems very API/application specific.

User should be able to specify certain criteria like latency and batch size per service so that QP can decide when to submit the batch to the user container.

Why not let the user handle this in the user-container? And they set the container concurrency to the batch size?

Each request of the batch is handled independently and transparently. In order to achieve this a protocol should be defined between QP and the user container so that user container should comply with it in order to receive data in batch mode.

Do you have an example of a protocol?

skonto · 2023-10-13T12:06:37Z

@dprotaso sorry for the delayed response:

I'm not sure if it's possible to generalize a batching API - it seems very API/application specific.

Many API providers though provide such endpoints for batch processing.
Could be an extension that provides some default settings like path, batch config,
format, metrics config etc.

Why not let the user handle this in the user-container? And they set the container concurrency to the batch size?

That is possible. I am trying to remove the overhead of the implementation byt providing a batch primitive.

Do you have an example of a protocol?

Here is an example: envoyproxy/envoy#6452
Maybe we could have this concept of proxy extensions/filters too to standardize some functionality.

skonto added the kind/feature Well-understood/specified features, ready for coding. label Feb 13, 2023

knative-prow bot added the area/API API objects and controllers label Feb 13, 2023

ReToCode added the triage/accepted Issues which should be fixed (post-triage) label Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QP request batcher #13691

QP request batcher #13691

skonto commented Feb 13, 2023 •

edited

Loading

dprotaso commented Mar 15, 2023

skonto commented Oct 13, 2023 •

edited

Loading

QP request batcher #13691

QP request batcher #13691

Comments

skonto commented Feb 13, 2023 • edited Loading

Describe the feature

Requirements

Use cases

dprotaso commented Mar 15, 2023

skonto commented Oct 13, 2023 • edited Loading

skonto commented Feb 13, 2023 •

edited

Loading

skonto commented Oct 13, 2023 •

edited

Loading