Skip to content

Request-based horizontal pod autoscaling #573

Closed
@deliahu

Description

@deliahu

Description

Currently, the user must tune an API's CPU request for horizontal pod autoscaling to behave as expected. An approach based on concurrent requests per container may be better (similar to what Knative uses).

This would also make autoscaling for GPU workloads behave more as expected

It may make sense to have both request-based and CPU/GPU-based autoscaling active at the same time, i.e. it will scale when either of the thresholds are met, and won't scale back until both metrics have backed off.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions