Skip to content

[core] aggregated metrics for ray_tasks/ray_actors #47289

Open
@hongchaodeng

Description

Description

Currently the the ray_tasks/actors metrics could be of huge volume. This is fine for single cluster. But for aggregated platform view this could be a problem of excessive load on Prometheus & Grafana server.

For these aggregated view, we don't need to know the NAME, WorkerId, etc. But these tags lead to high cardinality in output metrics. Due to the current limit of GAUGE type of these metrics, dropping labels is not ideal either.

We should add a new aggregated metrics for ray_tasks/ray_actors.

Use case

No response

Metadata

Assignees

Labels

P0Issues that should be fixed in short ordercoreIssues that should be addressed in Ray CoreenhancementRequest for new feature and/or capabilityobservabilityIssues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profilingstability

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions