Skip to content

Enable Replica Auto Scaling down to zero #445

Closed
@nickwalton

Description

@nickwalton

It would be really useful especially for smaller applications to be able to scale GPU's down to 0 when there is no traffic.

Possible approach

  • To trigger scaling 1 -> 0, check CloudWatch metrics for no requests for a certain amount of time (user-configurable?).
  • Scale 1 -> 0 by setting deployment.spec.replicas to 0.
  • When scaling 1 -> 0, also update the Istio Virtual Service to route requests to that API to a new deployment running in the Cortex node (or use the existing operator)
  • 0 -> 1 scaling is triggered when a request comes in to that service
  • Scale 0 -> 1 by setting deployment.spec.replicas to 0
  • Either the service holds onto the request until the pod is ready, forwards it, and replies with the response, or responds immediately with a message saying e.g. "0 -> 1 scaling has been triggered, please try again in a few minutes"

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestresearchDetermine technical constraints

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions