-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[core][metric] Redefine gcs STATS using metric interface #56201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9833161 to
1c6e1ea
Compare
aa8a215 to
85e420f
Compare
587ed3b to
bf0b3ff
Compare
ef046b6 to
b22516d
Compare
e53cd65 to
be370f9
Compare
| inline ray::stats::Gauge GetActorMetric() { | ||
| /// Tracks actors by state, including pending, running, and idle actors. | ||
| /// | ||
| /// To avoid metric collection conflicts between components reporting on the same actor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by this. Right now we have component labels for things like raylet and gcs and we have a Name label for a metric name. Now we're adding a source label. Why would two components necessarily conflict in the current set up? Are they within the same component? I'm unclear why we need an additional label now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about Source vs. Component. I’m also not sure of the original purpose of each, these tags predate both my time and this PR (which is purely a refactoring and doesn’t add the Source tag). I’m open to revisiting or merging the two tags in a follow-up PR.
|
|
||
| TaskCounter::TaskCounter(ray::observability::MetricInterface &task_by_state_counter) | ||
| : task_by_state_counter_(task_by_state_counter) { | ||
| TaskCounter::TaskCounter(ray::observability::MetricInterface &task_by_state_gauge, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should change the name of the type since it's now a gauge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal Prometheus metric has always been a gauge, while the wrapper has always been TaskCounter. I fixed the naming in this PR from task_by_state_counter to task_by_state_gauge to correct a regression I introduced earlier, but underneath, it has always been a gauge. I can see the merit in naming the wrapper Counter, since the concept of a gauge might feel unfamiliar or like an implementation detail, at least to me. But I’m open to using either name for the wrapper, though elsewhere in the codebase Gauge and Counter are used interchangeably for wrapper names (like in [1]).
be370f9 to
752db28
Compare
| auto counters = stats_counter_.GetAll(); | ||
| ray::stats::STATS_gcs_task_manager_task_events_reported.Record( | ||
| counters[kTotalNumTaskEventsReported]); | ||
| task_events_reported_gauge_.Record(counters[kTotalNumTaskEventsReported]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1]
a56c143 to
cabfde9
Compare
2afc068 to
526b745
Compare
Signed-off-by: Cuong Nguyen <can@anyscale.com>
Signed-off-by: Cuong Nguyen <can@anyscale.com>
Signed-off-by: Cuong Nguyen <can@anyscale.com>
526b745 to
67cd514
Compare
…#56201) This PR is in the series of unifying all metric definition infra. This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components. Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly. > > - **Metrics Infrastructure**: > - Introduce metric helpers in `src/ray/common/metrics.h` and `src/ray/gcs/metrics.h` (gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage). > - Replace direct `stats` usage with `MetricInterface` across GCS and core worker; rename helpers (e.g., `GetTaskMetric` -> `GetTaskByStateGaugeMetric`, `GetRayEventRecorderDroppedEventsMetric` -> `GetRayEventRecorderDroppedEventsCounterMetric`). > - **GCS Server Refactor**: > - `GcsServer` now constructs/accepts metric instances and passes them to subcomponents via `Start`/`DoStart` and init methods. > - `GcsActorManager`, `GcsJobManager`, `GcsPlacementGroupManager`, and `GcsTaskManager` constructors updated to receive and record via `MetricInterface`. > - `ObservableStoreClient` wraps delegate and records storage metrics via injected interfaces. > - **Core Worker**: > - `TaskCounter` and `CoreWorker` updated to use task/actor state gauges via injected `MetricInterface`. > - **Tests/Mocks/Build**: > - Update mocks and tests to use `FakeGauge`/`FakeCounter`/`FakeHistogram`; validate metric tags/values. > - Add Bazel targets/deps for new metric headers and fakes; minor BUILD wiring adjustments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bd5ff5a. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <can@anyscale.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
…#56201) This PR is in the series of unifying all metric definition infra. This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components. Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly. > > - **Metrics Infrastructure**: > - Introduce metric helpers in `src/ray/common/metrics.h` and `src/ray/gcs/metrics.h` (gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage). > - Replace direct `stats` usage with `MetricInterface` across GCS and core worker; rename helpers (e.g., `GetTaskMetric` -> `GetTaskByStateGaugeMetric`, `GetRayEventRecorderDroppedEventsMetric` -> `GetRayEventRecorderDroppedEventsCounterMetric`). > - **GCS Server Refactor**: > - `GcsServer` now constructs/accepts metric instances and passes them to subcomponents via `Start`/`DoStart` and init methods. > - `GcsActorManager`, `GcsJobManager`, `GcsPlacementGroupManager`, and `GcsTaskManager` constructors updated to receive and record via `MetricInterface`. > - `ObservableStoreClient` wraps delegate and records storage metrics via injected interfaces. > - **Core Worker**: > - `TaskCounter` and `CoreWorker` updated to use task/actor state gauges via injected `MetricInterface`. > - **Tests/Mocks/Build**: > - Update mocks and tests to use `FakeGauge`/`FakeCounter`/`FakeHistogram`; validate metric tags/values. > - Add Bazel targets/deps for new metric headers and fakes; minor BUILD wiring adjustments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bd5ff5a. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <can@anyscale.com> Signed-off-by: Seiji Eicher <seiji@anyscale.com>
…#56201) This PR is in the series of unifying all metric definition infra. This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components. Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly. > > - **Metrics Infrastructure**: > - Introduce metric helpers in `src/ray/common/metrics.h` and `src/ray/gcs/metrics.h` (gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage). > - Replace direct `stats` usage with `MetricInterface` across GCS and core worker; rename helpers (e.g., `GetTaskMetric` -> `GetTaskByStateGaugeMetric`, `GetRayEventRecorderDroppedEventsMetric` -> `GetRayEventRecorderDroppedEventsCounterMetric`). > - **GCS Server Refactor**: > - `GcsServer` now constructs/accepts metric instances and passes them to subcomponents via `Start`/`DoStart` and init methods. > - `GcsActorManager`, `GcsJobManager`, `GcsPlacementGroupManager`, and `GcsTaskManager` constructors updated to receive and record via `MetricInterface`. > - `ObservableStoreClient` wraps delegate and records storage metrics via injected interfaces. > - **Core Worker**: > - `TaskCounter` and `CoreWorker` updated to use task/actor state gauges via injected `MetricInterface`. > - **Tests/Mocks/Build**: > - Update mocks and tests to use `FakeGauge`/`FakeCounter`/`FakeHistogram`; validate metric tags/values. > - Add Bazel targets/deps for new metric headers and fakes; minor BUILD wiring adjustments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bd5ff5a. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <can@anyscale.com>
…#56201) This PR is in the series of unifying all metric definition infra. This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components. Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly. > > - **Metrics Infrastructure**: > - Introduce metric helpers in `src/ray/common/metrics.h` and `src/ray/gcs/metrics.h` (gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage). > - Replace direct `stats` usage with `MetricInterface` across GCS and core worker; rename helpers (e.g., `GetTaskMetric` -> `GetTaskByStateGaugeMetric`, `GetRayEventRecorderDroppedEventsMetric` -> `GetRayEventRecorderDroppedEventsCounterMetric`). > - **GCS Server Refactor**: > - `GcsServer` now constructs/accepts metric instances and passes them to subcomponents via `Start`/`DoStart` and init methods. > - `GcsActorManager`, `GcsJobManager`, `GcsPlacementGroupManager`, and `GcsTaskManager` constructors updated to receive and record via `MetricInterface`. > - `ObservableStoreClient` wraps delegate and records storage metrics via injected interfaces. > - **Core Worker**: > - `TaskCounter` and `CoreWorker` updated to use task/actor state gauges via injected `MetricInterface`. > - **Tests/Mocks/Build**: > - Update mocks and tests to use `FakeGauge`/`FakeCounter`/`FakeHistogram`; validate metric tags/values. > - Add Bazel targets/deps for new metric headers and fakes; minor BUILD wiring adjustments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bd5ff5a. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <can@anyscale.com>
…#56201) This PR is in the series of unifying all metric definition infra. This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components. Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly. > > - **Metrics Infrastructure**: > - Introduce metric helpers in `src/ray/common/metrics.h` and `src/ray/gcs/metrics.h` (gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage). > - Replace direct `stats` usage with `MetricInterface` across GCS and core worker; rename helpers (e.g., `GetTaskMetric` -> `GetTaskByStateGaugeMetric`, `GetRayEventRecorderDroppedEventsMetric` -> `GetRayEventRecorderDroppedEventsCounterMetric`). > - **GCS Server Refactor**: > - `GcsServer` now constructs/accepts metric instances and passes them to subcomponents via `Start`/`DoStart` and init methods. > - `GcsActorManager`, `GcsJobManager`, `GcsPlacementGroupManager`, and `GcsTaskManager` constructors updated to receive and record via `MetricInterface`. > - `ObservableStoreClient` wraps delegate and records storage metrics via injected interfaces. > - **Core Worker**: > - `TaskCounter` and `CoreWorker` updated to use task/actor state gauges via injected `MetricInterface`. > - **Tests/Mocks/Build**: > - Update mocks and tests to use `FakeGauge`/`FakeCounter`/`FakeHistogram`; validate metric tags/values. > - Add Bazel targets/deps for new metric headers and fakes; minor BUILD wiring adjustments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bd5ff5a. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <can@anyscale.com>
…#56201) This PR is in the series of unifying all metric definition infra. This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components. Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly. > > - **Metrics Infrastructure**: > - Introduce metric helpers in `src/ray/common/metrics.h` and `src/ray/gcs/metrics.h` (gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage). > - Replace direct `stats` usage with `MetricInterface` across GCS and core worker; rename helpers (e.g., `GetTaskMetric` -> `GetTaskByStateGaugeMetric`, `GetRayEventRecorderDroppedEventsMetric` -> `GetRayEventRecorderDroppedEventsCounterMetric`). > - **GCS Server Refactor**: > - `GcsServer` now constructs/accepts metric instances and passes them to subcomponents via `Start`/`DoStart` and init methods. > - `GcsActorManager`, `GcsJobManager`, `GcsPlacementGroupManager`, and `GcsTaskManager` constructors updated to receive and record via `MetricInterface`. > - `ObservableStoreClient` wraps delegate and records storage metrics via injected interfaces. > - **Core Worker**: > - `TaskCounter` and `CoreWorker` updated to use task/actor state gauges via injected `MetricInterface`. > - **Tests/Mocks/Build**: > - Update mocks and tests to use `FakeGauge`/`FakeCounter`/`FakeHistogram`; validate metric tags/values. > - Add Bazel targets/deps for new metric headers and fakes; minor BUILD wiring adjustments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bd5ff5a. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <can@anyscale.com>
)" This reverts commit 534b0e4.
…#56201) This PR is in the series of unifying all metric definition infra. This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components. Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly. > > - **Metrics Infrastructure**: > - Introduce metric helpers in `src/ray/common/metrics.h` and `src/ray/gcs/metrics.h` (gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage). > - Replace direct `stats` usage with `MetricInterface` across GCS and core worker; rename helpers (e.g., `GetTaskMetric` -> `GetTaskByStateGaugeMetric`, `GetRayEventRecorderDroppedEventsMetric` -> `GetRayEventRecorderDroppedEventsCounterMetric`). > - **GCS Server Refactor**: > - `GcsServer` now constructs/accepts metric instances and passes them to subcomponents via `Start`/`DoStart` and init methods. > - `GcsActorManager`, `GcsJobManager`, `GcsPlacementGroupManager`, and `GcsTaskManager` constructors updated to receive and record via `MetricInterface`. > - `ObservableStoreClient` wraps delegate and records storage metrics via injected interfaces. > - **Core Worker**: > - `TaskCounter` and `CoreWorker` updated to use task/actor state gauges via injected `MetricInterface`. > - **Tests/Mocks/Build**: > - Update mocks and tests to use `FakeGauge`/`FakeCounter`/`FakeHistogram`; validate metric tags/values. > - Add Bazel targets/deps for new metric headers and fakes; minor BUILD wiring adjustments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bd5ff5a. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <can@anyscale.com>
…#56201) This PR is in the series of unifying all metric definition infra. This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components. Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly. > > - **Metrics Infrastructure**: > - Introduce metric helpers in `src/ray/common/metrics.h` and `src/ray/gcs/metrics.h` (gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage). > - Replace direct `stats` usage with `MetricInterface` across GCS and core worker; rename helpers (e.g., `GetTaskMetric` -> `GetTaskByStateGaugeMetric`, `GetRayEventRecorderDroppedEventsMetric` -> `GetRayEventRecorderDroppedEventsCounterMetric`). > - **GCS Server Refactor**: > - `GcsServer` now constructs/accepts metric instances and passes them to subcomponents via `Start`/`DoStart` and init methods. > - `GcsActorManager`, `GcsJobManager`, `GcsPlacementGroupManager`, and `GcsTaskManager` constructors updated to receive and record via `MetricInterface`. > - `ObservableStoreClient` wraps delegate and records storage metrics via injected interfaces. > - **Core Worker**: > - `TaskCounter` and `CoreWorker` updated to use task/actor state gauges via injected `MetricInterface`. > - **Tests/Mocks/Build**: > - Update mocks and tests to use `FakeGauge`/`FakeCounter`/`FakeHistogram`; validate metric tags/values. > - Add Bazel targets/deps for new metric headers and fakes; minor BUILD wiring adjustments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bd5ff5a. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <can@anyscale.com> Signed-off-by: Josh Kodi <joshkodi@gmail.com>
…y-project#57248) Reverts ray-project#56201 Signed-off-by: Josh Kodi <joshkodi@gmail.com>
…#56201) This PR is in the series of unifying all metric definition infra. This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components. Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly. > > - **Metrics Infrastructure**: > - Introduce metric helpers in `src/ray/common/metrics.h` and `src/ray/gcs/metrics.h` (gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage). > - Replace direct `stats` usage with `MetricInterface` across GCS and core worker; rename helpers (e.g., `GetTaskMetric` -> `GetTaskByStateGaugeMetric`, `GetRayEventRecorderDroppedEventsMetric` -> `GetRayEventRecorderDroppedEventsCounterMetric`). > - **GCS Server Refactor**: > - `GcsServer` now constructs/accepts metric instances and passes them to subcomponents via `Start`/`DoStart` and init methods. > - `GcsActorManager`, `GcsJobManager`, `GcsPlacementGroupManager`, and `GcsTaskManager` constructors updated to receive and record via `MetricInterface`. > - `ObservableStoreClient` wraps delegate and records storage metrics via injected interfaces. > - **Core Worker**: > - `TaskCounter` and `CoreWorker` updated to use task/actor state gauges via injected `MetricInterface`. > - **Tests/Mocks/Build**: > - Update mocks and tests to use `FakeGauge`/`FakeCounter`/`FakeHistogram`; validate metric tags/values. > - Add Bazel targets/deps for new metric headers and fakes; minor BUILD wiring adjustments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bd5ff5a. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <can@anyscale.com>
…#56201) This PR is in the series of unifying all metric definition infra. This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components. Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly. > > - **Metrics Infrastructure**: > - Introduce metric helpers in `src/ray/common/metrics.h` and `src/ray/gcs/metrics.h` (gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage). > - Replace direct `stats` usage with `MetricInterface` across GCS and core worker; rename helpers (e.g., `GetTaskMetric` -> `GetTaskByStateGaugeMetric`, `GetRayEventRecorderDroppedEventsMetric` -> `GetRayEventRecorderDroppedEventsCounterMetric`). > - **GCS Server Refactor**: > - `GcsServer` now constructs/accepts metric instances and passes them to subcomponents via `Start`/`DoStart` and init methods. > - `GcsActorManager`, `GcsJobManager`, `GcsPlacementGroupManager`, and `GcsTaskManager` constructors updated to receive and record via `MetricInterface`. > - `ObservableStoreClient` wraps delegate and records storage metrics via injected interfaces. > - **Core Worker**: > - `TaskCounter` and `CoreWorker` updated to use task/actor state gauges via injected `MetricInterface`. > - **Tests/Mocks/Build**: > - Update mocks and tests to use `FakeGauge`/`FakeCounter`/`FakeHistogram`; validate metric tags/values. > - Add Bazel targets/deps for new metric headers and fakes; minor BUILD wiring adjustments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bd5ff5a. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <can@anyscale.com>
…#56201) This PR is in the series of unifying all metric definition infra. This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components. Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you. Test: - CI <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly. > > - **Metrics Infrastructure**: > - Introduce metric helpers in `src/ray/common/metrics.h` and `src/ray/gcs/metrics.h` (gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage). > - Replace direct `stats` usage with `MetricInterface` across GCS and core worker; rename helpers (e.g., `GetTaskMetric` -> `GetTaskByStateGaugeMetric`, `GetRayEventRecorderDroppedEventsMetric` -> `GetRayEventRecorderDroppedEventsCounterMetric`). > - **GCS Server Refactor**: > - `GcsServer` now constructs/accepts metric instances and passes them to subcomponents via `Start`/`DoStart` and init methods. > - `GcsActorManager`, `GcsJobManager`, `GcsPlacementGroupManager`, and `GcsTaskManager` constructors updated to receive and record via `MetricInterface`. > - `ObservableStoreClient` wraps delegate and records storage metrics via injected interfaces. > - **Core Worker**: > - `TaskCounter` and `CoreWorker` updated to use task/actor state gauges via injected `MetricInterface`. > - **Tests/Mocks/Build**: > - Update mocks and tests to use `FakeGauge`/`FakeCounter`/`FakeHistogram`; validate metric tags/values. > - Add Bazel targets/deps for new metric headers and fakes; minor BUILD wiring adjustments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit bd5ff5a. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Signed-off-by: Cuong Nguyen <can@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…y-project#57248) Reverts ray-project#56201 Signed-off-by: Aydin Abiar <aydin@anyscale.com>
This PR is in the series of unifying all metric definition infra.
This PR migrates all GCS metrics to use the metric interface. It does that by creating the metric object inside gcs_server and pass them down as interfaces to sub-components.
Purely refactoring code and repetitive patterns, easier to review than the number of file changed tells you.
Test:
Note
Refactors GCS and core worker to use injected MetricInterface objects for all metrics, adding new metric helpers and rewiring constructors, server startup, storage client, and tests accordingly.
src/ray/common/metrics.handsrc/ray/gcs/metrics.h(gauges/histograms/counters for actors, jobs, placement groups, task events, and GCS storage).statsusage withMetricInterfaceacross GCS and core worker; rename helpers (e.g.,GetTaskMetric->GetTaskByStateGaugeMetric,GetRayEventRecorderDroppedEventsMetric->GetRayEventRecorderDroppedEventsCounterMetric).GcsServernow constructs/accepts metric instances and passes them to subcomponents viaStart/DoStartand init methods.GcsActorManager,GcsJobManager,GcsPlacementGroupManager, andGcsTaskManagerconstructors updated to receive and record viaMetricInterface.ObservableStoreClientwraps delegate and records storage metrics via injected interfaces.TaskCounterandCoreWorkerupdated to use task/actor state gauges via injectedMetricInterface.FakeGauge/FakeCounter/FakeHistogram; validate metric tags/values.Written by Cursor Bugbot for commit bd5ff5a. This will update automatically on new commits. Configure here.