[RayTrain] ScalingConfig resources_per_worker input validation/error handling #49372
Labels
enhancement
Request for new feature and/or capability
train
Ray Train Related Issue
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
Description
Adding error handler to help users identify when they have input an invalid resource type (e.g. misspelling a resource as "cpu" or "Memory", adding a parameter that does not exist, etc.)
Currently if you provide something like "memory" misspelt as "Memory" Ray will complain that your cluster lacks resources (even if you are requesting less than the available amount of resources).
This change adds a simple error check that will tell users if they have provided a misspelt or invalid resource name type,
(See slack thread for issue inspiration: https://ray.slack.com/archives/C053M5UBEVD/p1734471893141579)
Use case
When performing a training run, making sure that users can quickly identify a mistyped/misnamed ScalingConfig input.
The text was updated successfully, but these errors were encountered: