Enable Replica Auto Scaling down to zero

It would be really useful especially for smaller applications to be able to scale GPU's down to 0 when there is no traffic.

## Possible approach

* To trigger scaling 1 -> 0, check CloudWatch metrics for no requests for a certain amount of time (user-configurable?).
* Scale 1 -> 0 by setting `deployment.spec.replicas` to 0.
* When scaling 1 -> 0, also update the Istio Virtual Service to route requests to that API to a new deployment running in the Cortex node (or use the existing operator)
* 0 -> 1 scaling is triggered when a request comes in to that service
* Scale 0 -> 1 by setting deployment.spec.replicas to 0
* Either the service holds onto the request until the pod is ready, forwards it, and replies with the response, or responds immediately with a message saying e.g. "0 -> 1 scaling has been triggered, please try again in a few minutes"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable Replica Auto Scaling down to zero #445

Possible approach

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable Replica Auto Scaling down to zero #445

Description

Possible approach

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions