Skip to content

Conversation

@josecelano
Copy link
Member

@josecelano josecelano commented Jun 11, 2025

The idea is to continue exposing the stats endpoint but removing the internal code to calculate them. Instead, we calculate the metrics from the labeled metrics.

That will simplify the event listeners and avoid duplicate code to calculate the same data.

Endpoints for statistics

stats (simple predefined global metrics)

metrics (extendable-labelled metrics)

Subtasks

  • Add a new method to MetricCollection to sum the metric's samples.
  • Review test coverage.
  • Use the new method to calculate metrics in the stats endpoint.

After merging this PR

  • Deploy to the tracker demo and verify that both Grafana dashboards appear identical.
  • Delete unused code to calculate the metrics in the stats endpoint directly in the event listeners.
  • Redeploy to the tracker demo and re-verify.

NOTICE: We have two Grafana dashboards, one collecting data from the stats endpoint and another collecting data from metrics, and they look identical now. If there is a bug in the way we calculate aggregate values in the MetricCollection, we should see different graphs in the dashboards.

Both dashboards (built from stats on the left, built from metrics on the right):

image

NOTICE: Also, the new methods will be helpful for external users once we publish the metrics crate. Consumers of the Torrust Tracker API (metrics endpoint) will be able to deserialize and calculate aggregate values from the response.

@josecelano josecelano requested a review from da2ce7 June 11, 2025 17:29
@josecelano josecelano self-assigned this Jun 11, 2025
@codecov
Copy link

codecov bot commented Jun 11, 2025

Codecov Report

Attention: Patch coverage is 98.85774% with 11 lines in your changes missing coverage. Please review.

Project coverage is 84.79%. Comparing base (c3dffa5) to head (dcfb5d5).
Report is 9 commits behind head on develop.

Files with missing lines Patch % Lines
...s/rest-tracker-api-core/src/statistics/services.rs 98.18% 4 Missing ⚠️
.../http-tracker-core/src/statistics/event/handler.rs 50.00% 2 Missing ⚠️
packages/metrics/src/metric/aggregate/sum.rs 98.97% 2 Missing ⚠️
packages/primitives/src/service_binding.rs 92.30% 2 Missing ⚠️
...r/src/statistics/event/handler/request_accepted.rs 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1577      +/-   ##
===========================================
+ Coverage    84.15%   84.79%   +0.64%     
===========================================
  Files          286      289       +3     
  Lines        20789    21712     +923     
  Branches     20789    21712     +923     
===========================================
+ Hits         17495    18411     +916     
- Misses        2974     2984      +10     
+ Partials       320      317       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

It allows sum metric samples matching a given criteria. The criteria is
a label set. Sample values are added if they contain all the label
name/value pairs specified in the criteria.

For example, given these metric's samples in Prometheus export text format:

```
udp_tracker_server_requests_accepted_total{request_kind="scrape",server_binding_address_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6969",server_binding_protocol="udp"} 213118
udp_tracker_server_requests_accepted_total{request_kind="announce",server_binding_address_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6969",server_binding_protocol="udp"} 16460553
udp_tracker_server_requests_accepted_total{request_kind="connect",server_binding_address_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6868",server_binding_protocol="udp"} 617
udp_tracker_server_requests_accepted_total{request_kind="connect",server_binding_address_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6969",server_binding_protocol="udp"} 17148137
```

And the criteria: it should contain the label `request_kind` with the value `connect`.

It should return: 617 + 17148137 = 17148754
@josecelano josecelano force-pushed the 1446-overhaul-stats-refactor-metrics-part-2 branch from bd10f58 to 64be847 Compare June 11, 2025 17:48
@josecelano
Copy link
Member Author

image

@josecelano
Copy link
Member Author

I've been considering removing the traits:

src/metric_collection/aggregate.rs:

pub trait Sum {
    fn sum(&self, metric_name: &MetricName, label_set_criteria: &LabelSet) -> Option<AggregateValue>;
}

src/metric/aggregate/sum.rs:

pub trait Sum {
    fn sum(&self, label_set_criteria: &LabelSet) -> AggregateValue;
}

I wanted to keep MetricCollection and Metric types unaware (decouple) from the aggregate functions. So:

  • You can add more aggregate functions without adding more methods.
  • Aggregate functions don't need access to the internals of those types. They only need to get the values (iterate).

However, I tried to do the refactor, and I ended up with something like:

/// Calculate the sum of metric samples for a given metric name and label criteria.
/// 
/// This function follows the Open/Closed principle by allowing new aggregate functions
/// to be added without modifying the MetricCollection type.
pub fn sum(collection: &MetricCollection, metric_name: &MetricName, label_set_criteria: &LabelSet) -> Option<AggregateValue> {
    // Direct implementation without trait dependency
    if let Some(counter_collection) = collection.counters.metrics.get(metric_name) {
        return Some(counter_collection.sum(label_set_criteria));
    }
    
    if let Some(gauge_collection) = collection.gauges.metrics.get(metric_name) {
        return Some(gauge_collection.sum(label_set_criteria));
    }
    
    None
}

The problem with this approach is that the aggregate function will be broken as soon as we add a new type of metric to the metric collection.

It doesn't work for the metric collection, but it could work for the metric.

Anyway, since we only have one aggregate function, I will keep it as it is for now. We can reconsider other alternatives if we plan to add more aggregate functions in the future.

cc @da2ce7

…etrics

We need to add a new label to make it easier to fileter by the server IP
family: IPV4 or IPv6.
Address might be a socket address.
Example:

```
udp_tracker_core_requests_received_total{request_kind="connect",server_binding_address_ip_family="inet",server_binding_address_ip_type="plain",server_binding_ip="0.0.0.0",server_binding_port="6969",server_binding_protocol="udp"} 1
```

It's needed to easily filter metric samples to calculate aggregate
values for a given IP family (IPv4 or IPv6).
@josecelano josecelano force-pushed the 1446-overhaul-stats-refactor-metrics-part-2 branch from 08c0c38 to dcfb5d5 Compare June 13, 2025 11:22
@josecelano
Copy link
Member Author

ACK dcfb5d5

@josecelano josecelano marked this pull request as ready for review June 13, 2025 11:23
@josecelano josecelano linked an issue Jun 13, 2025 that may be closed by this pull request
@josecelano josecelano merged commit 34a6635 into torrust:develop Jun 13, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Overhaul stats: Refactor metrics (part 2)

1 participant