[autoscaler][kubernetes] autoscaling hotfix #14024

DmitriGekhtman · 2021-02-10T02:38:43Z

Why are these changes needed?

Right now Kubernetes fill_out_available_node_type_resources logic fills "GPU":0 for non-gpu nodes, which interacts badly with the resource demand scheduler's gpu conservation logic, preventing autoscaling on k8s.
This fixes KubernetesNodeProvider's resource-filling logic to not fill fields with value 0.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Did quick manual check that this fixes the problem.
Will add more test logic to the K8s operator unit test (not currently in CI) later.

This reverts commit 74e87a0.

hotfix

e5229d7

DmitriGekhtman requested a review from ericl February 10, 2021 02:38

DmitriGekhtman assigned ericl Feb 10, 2021

DmitriGekhtman added this to the Serverless Autoscaling milestone Feb 10, 2021

ericl approved these changes Feb 10, 2021

View reviewed changes

ericl merged commit 8ca0a32 into ray-project:master Feb 10, 2021

DmitriGekhtman deleted the k8s-autoscaler-hotfix branch February 10, 2021 21:04

fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021

HotFix k8s autoscaling (ray-project#14024)

74e87a0

fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021

Revert "HotFix k8s autoscaling (ray-project#14024)"

92cecd8

This reverts commit 74e87a0.

DmitriGekhtman mentioned this pull request Feb 25, 2021

[autoscaler] v1.2.0 Autoscaler crashes when attempting to calculate resource usage #14346

Closed

DmitriGekhtman mentioned this pull request Mar 9, 2021

[tune] Parallel tune calls not autoscaling ray cluster from head node #14544

Closed

DmitriGekhtman mentioned this pull request Jun 29, 2021

[autoscaler] GPU=0 resource tweak #16761

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[autoscaler][kubernetes] autoscaling hotfix #14024

[autoscaler][kubernetes] autoscaling hotfix #14024

DmitriGekhtman commented Feb 10, 2021 •

edited

Loading

[autoscaler][kubernetes] autoscaling hotfix #14024

[autoscaler][kubernetes] autoscaling hotfix #14024

Conversation

DmitriGekhtman commented Feb 10, 2021 • edited Loading

Why are these changes needed?

Related issue number

Checks

DmitriGekhtman commented Feb 10, 2021 •

edited

Loading