Skip to content

[RayTrain] ScalingConfig resources_per_worker input validation/error handling  #49372

Open
@astanley-work

Description

@astanley-work

Description

Adding error handler to help users identify when they have input an invalid resource type (e.g. misspelling a resource as "cpu" or "Memory", adding a parameter that does not exist, etc.)

Currently if you provide something like "memory" misspelt as "Memory" Ray will complain that your cluster lacks resources (even if you are requesting less than the available amount of resources).

This change adds a simple error check that will tell users if they have provided a misspelt or invalid resource name type,

(See slack thread for issue inspiration: https://ray.slack.com/archives/C053M5UBEVD/p1734471893141579)

Use case

When performing a training run, making sure that users can quickly identify a mistyped/misnamed ScalingConfig input.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementRequest for new feature and/or capabilitytrainRay Train Related IssuetriageNeeds triage (eg: priority, bug/not-bug, and owning component)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions