Closed
Description
Description
Chatted with @ArturNiederfahrenhorst and having GPU stats per actor or per worker process in the node table could be useful.
Use case
One potential use case for this: if someone is doing a HP sweep with many trials that each doesn't use a full GPU, it can be interesting for sure how much each of them ends up using to find out what parameters cause this.
I want to two Ray Tune trials on a cluster
The cluster has one GPU head node
Each trials should be allocated 1/2 GPU
Now one of the damn trials is super slow
In order to find out why, it would be cool to see how much of the GPU it’s actually utilizing
Since I only see the overall util, the GPU util of any given trial in underdetermined
Activity