Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoscaler] make 0 default min/max workers for head node #17757

Merged
merged 15 commits into from
Aug 25, 2021
Merged

[autoscaler] make 0 default min/max workers for head node #17757

merged 15 commits into from
Aug 25, 2021

Conversation

sasha-s
Copy link
Contributor

@sasha-s sasha-s commented Aug 11, 2021

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sasha-s sasha-s requested a review from DmitriGekhtman August 11, 2021 20:38
@sasha-s sasha-s linked an issue Aug 11, 2021 that may be closed by this pull request
2 tasks
@DmitriGekhtman DmitriGekhtman changed the title make 0 default min/max workers for head node [autoscaler] make 0 default min/max workers for head node Aug 12, 2021
@DmitriGekhtman
Copy link
Contributor

It's customary to add a tag to the title to aid in release tracking -- added "[autoscaler]".

@@ -2857,6 +2855,15 @@ def metrics_incremented():
self.waitFor(
metrics_incremented, fail_msg="Expected metrics to update")

def testValidateDefaultConfigi2(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in function name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

max_workers=config["max_workers"],
version=ray.__version__))
if node_type_name == config["head_node_type"]:
node_type_data.setdefault("min_workers", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want max_workers for the head node type defaulted to 0 (and to global max_workers for the rest of the node types).

I think the autoscaler infers min_workers internally 0 if min_workers is missing for a node type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to 0 form the head node, setting min_workers to 0 as default.

@DmitriGekhtman
Copy link
Contributor

Other things that need to be changed:

There's a test in test_cli.py that explicitly checks for the max_workers warning message that's been deleted -- that test and the file it uses should be deleted.

The documentation on node type max workers should be updated
https://github.com/ray-project/ray/blob/55680a1f9ed3d8d1d4762039221b9fba4645383b/doc/source/cluster/config.rst

The helm chart template

minWorkers: {{ $val.minWorkers }}
maxWorkers: {{ $val.maxWorkers }}

needs to be updated to ignore min_workers and max_workers if they're not specified in values.yaml
minWorkers: 0
maxWorkers: 0
.

There's another example that needs to be modified here

minWorkers: 0
# Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
maxWorkers: 0

@sasha-s
Copy link
Contributor Author

sasha-s commented Aug 12, 2021

Other things that need to be changed:

There's a test in test_cli.py that explicitly checks for the max_workers warning message that's been deleted -- that test and the file it uses should be deleted.

The documentation on node type max workers should be updated
https://github.com/ray-project/ray/blob/55680a1f9ed3d8d1d4762039221b9fba4645383b/doc/source/cluster/config.rst

The helm chart template

minWorkers: {{ $val.minWorkers }}
maxWorkers: {{ $val.maxWorkers }}

needs to be updated to ignore min_workers and max_workers if they're not specified in values.yaml

minWorkers: 0
maxWorkers: 0

.
There's another example that needs to be modified here

minWorkers: 0
# Maximum number of Ray workers of this Pod type. Takes precedence over minWorkers.
maxWorkers: 0

thanks, I missed those

@sasha-s sasha-s closed this Aug 12, 2021
@sasha-s sasha-s reopened this Aug 12, 2021
@DmitriGekhtman
Copy link
Contributor

DmitriGekhtman commented Aug 16, 2021

@sasha-s
Copy link
Contributor Author

sasha-s commented Aug 16, 2021

@DmitriGekhtman
Copy link
Contributor

lgtm -- I'm just going to run some quick manual checks that k8s stuff works

@AmeerHajAli AmeerHajAli added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 18, 2021
@DmitriGekhtman
Copy link
Contributor

The somewhat unorthodox convention we use for Ray github is to add reviewers as assignees (makes searching PRs to be reviewed easier for reviewers)

@DmitriGekhtman DmitriGekhtman self-assigned this Aug 19, 2021
@DmitriGekhtman
Copy link
Contributor

LGTM -- would you mind doing a manual check by launching a cluster on AWS with these changes checked out locally?

Co-authored-by: Dmitri Gekhtman <62982571+DmitriGekhtman@users.noreply.github.com>
@DmitriGekhtman DmitriGekhtman removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 19, 2021
@AmeerHajAli AmeerHajAli added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 20, 2021
@DmitriGekhtman
Copy link
Contributor

Looks like there's a Mac build failure. Could you rebase or merge master to retry that?

@DmitriGekhtman
Copy link
Contributor

DmitriGekhtman commented Aug 24, 2021

eh, sry, realized there's one more issue with the helm chart.

As is, there's an incompatibility with the default rayproject/ray:latest operator image, since the max/min-workers-filling logic runs in the operator container.

To avoid the incompatibility, could you modify this part to default to 0 if minWorkers/maxWorkers is not present? There's Helm syntax for defaults which simplifies this.

@sasha-s
Copy link
Contributor Author

sasha-s commented Aug 24, 2021

eh, sry, realized there's one more issue with the helm chart.

As is, there's an incompatibility with the default rayproject/ray:latest operator image, since the max/min-workers-filling logic runs in the operator container.

To avoid the incompatibility, could you modify this part to default to 0 if minWorkers/maxWorkers is not present? There's Helm syntax for defaults which simplifies this.

eh, sry, realized there's one more issue with the helm chart.

As is, there's an incompatibility with the default rayproject/ray:latest operator image, since the max/min-workers-filling logic runs in the operator container.

To avoid the incompatibility, could you modify this part to default to 0 if minWorkers/maxWorkers is not present? There's Helm syntax for defaults which simplifies this.
I guess it is not feasible to remove the defaults from rayproject/ray:latest.
Changes min/max to default 0.

@sasha-s sasha-s removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Aug 25, 2021
@DmitriGekhtman
Copy link
Contributor

looks good now!
Windows failures are unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[autoscaler] Default min and max workers to 0 for the head node type.
3 participants