Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RayTrain] ScalingConfig resources_per_worker input validation/error handling #49372

Open
astanley-work opened this issue Dec 19, 2024 · 0 comments
Labels
enhancement Request for new feature and/or capability train Ray Train Related Issue triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@astanley-work
Copy link

Description

Adding error handler to help users identify when they have input an invalid resource type (e.g. misspelling a resource as "cpu" or "Memory", adding a parameter that does not exist, etc.)

Currently if you provide something like "memory" misspelt as "Memory" Ray will complain that your cluster lacks resources (even if you are requesting less than the available amount of resources).

This change adds a simple error check that will tell users if they have provided a misspelt or invalid resource name type,

(See slack thread for issue inspiration: https://ray.slack.com/archives/C053M5UBEVD/p1734471893141579)

Use case

When performing a training run, making sure that users can quickly identify a mistyped/misnamed ScalingConfig input.

@astanley-work astanley-work added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Dec 19, 2024
@matthewdeng matthewdeng reopened this Dec 30, 2024
@jcotant1 jcotant1 added the train Ray Train Related Issue label Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability train Ray Train Related Issue triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

3 participants