Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QP request batcher #13691

Open
skonto opened this issue Feb 13, 2023 · 2 comments
Open

QP request batcher #13691

skonto opened this issue Feb 13, 2023 · 2 comments
Labels
area/API API objects and controllers kind/feature Well-understood/specified features, ready for coding. triage/accepted Issues which should be fixed (post-triage)

Comments

@skonto
Copy link
Contributor

skonto commented Feb 13, 2023

/area API

Describe the feature

Requirements

  • User should be able to specify certain criteria like latency and batch size per service so that QP can decide when to submit the batch to the user container. This would require to annotate a specific service accordingly.
  • Each request of the batch is handled independently and transparently. In order to achieve this a protocol should be defined between QP and the user container so that user container should comply with it in order to receive data in batch mode.
  • This does not cover the case where requests need to be sent as a batch across the whole Knative data plane
  • The feature will be an extension, not enabled by default

Use cases

  • There are scenarios where http requests need to be delivered as a batch instead of one by one. A common scenario is model serving where you get better performance if requests are collected as a batch from the backend in order to apply an operation per data vector and not per data instance. An implementation for Knative Serving that uses an intermediate container can be found in KServe here.
  • As discussed here users coming from other systems such as AWS SQS might expect a batch configuration option to consume more than one requests at least at the user container/backend side.
@skonto skonto added the kind/feature Well-understood/specified features, ready for coding. label Feb 13, 2023
@knative-prow knative-prow bot added the area/API API objects and controllers label Feb 13, 2023
@ReToCode ReToCode added the triage/accepted Issues which should be fixed (post-triage) label Mar 8, 2023
@dprotaso
Copy link
Member

I'm not sure if it's possible to generalize a batching API - it seems very API/application specific.

User should be able to specify certain criteria like latency and batch size per service so that QP can decide when to submit the batch to the user container.

Why not let the user handle this in the user-container? And they set the container concurrency to the batch size?

Each request of the batch is handled independently and transparently. In order to achieve this a protocol should be defined between QP and the user container so that user container should comply with it in order to receive data in batch mode.

Do you have an example of a protocol?

@skonto
Copy link
Contributor Author

skonto commented Oct 13, 2023

@dprotaso sorry for the delayed response:

I'm not sure if it's possible to generalize a batching API - it seems very API/application specific.

Many API providers though provide such endpoints for batch processing.
Could be an extension that provides some default settings like path, batch config,
format, metrics config etc.

Why not let the user handle this in the user-container? And they set the container concurrency to the batch size?

That is possible. I am trying to remove the overhead of the implementation byt providing a batch primitive.

Do you have an example of a protocol?

Here is an example: envoyproxy/envoy#6452
Maybe we could have this concept of proxy extensions/filters too to standardize some functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers kind/feature Well-understood/specified features, ready for coding. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
Development

No branches or pull requests

3 participants