Skip to content

[Dashboard] Provide GPU stats per actor or per worker process in the node table #31998

Closed
@scottsun94

Description

Description

Chatted with @ArturNiederfahrenhorst and having GPU stats per actor or per worker process in the node table could be useful.

Use case

One potential use case for this: if someone is doing a HP sweep with many trials that each doesn't use a full GPU, it can be interesting for sure how much each of them ends up using to find out what parameters cause this.

I want to two Ray Tune trials on a cluster
The cluster has one GPU head node
Each trials should be allocated 1/2 GPU
Now one of the damn trials is super slow
In order to find out why, it would be cool to see how much of the GPU it’s actually utilizing
Since I only see the overall util, the GPU util of any given trial in underdetermined

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    dashboardIssues specific to the Ray DashboardenhancementRequest for new feature and/or capabilityobservabilityIssues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or ProfilingstaleThe issue is stale. It will be closed within 7 days unless there are further conversation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions