[core] aggregated metrics for ray_tasks
/ray_actors
#47289
Open
Description
Description
Currently the the ray_tasks/actors metrics could be of huge volume. This is fine for single cluster. But for aggregated platform view this could be a problem of excessive load on Prometheus & Grafana server.
For these aggregated view, we don't need to know the NAME
, WorkerId
, etc. But these tags lead to high cardinality in output metrics. Due to the current limit of GAUGE type of these metrics, dropping labels is not ideal either.
We should add a new aggregated metrics for ray_tasks
/ray_actors
.
Use case
No response