-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[statsdreceiver] incorrect aggregation when several producers report the same metrics #23809
Comments
Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Pinging code owners for receiver/statsd: @jmacd @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I've run into the same problem.
The larger the aggregation time span, the larger the error gets. As is, this receiver is unusable, because even at aggregation of 1s, it still looses values. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
My team has the same problem as well. The un-aggregated metrics flow directly into the awsemf exporter, which causes a conflict as the exporter doesn't allow duplicate metrics in its batch, dropping all but one of the metrics before export. This is a confirmed issue in both docker (using docker compose and netcat) and kubernetes (Kong Statsd exporter -> ADOT) |
The receiver uses each metric's address as the key to create instruments, but the port in the address of each metric is dynamically assigned. This results in the StatsD receiver creating numerous instruments and being unable to aggregate metrics together. Fixes open-telemetry#29508, fixex open-telemetry#23809
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Is it correct that this should be closed instead of getting fixed? |
I'm not a statsd expert myself. Let's take Telegraf's implementaiton as an example. I would argue applications should expose tags which identify themselves. Long story short. I propose to make this addr aggregation optional by a configuration flag. |
**Description:** <Describe what has changed.> The `statsdreceiver` only aggregates metrics on `protocol+host+ip`, this leads to issues or inconsistencies when dealing with clients that constantly switch tcp/udp ports. To address the issue, this PR adds a configuration option `enableIPOnlyAggregation` that allows the use to specify if they want to aggregate on the `IP` instead of `IP+Port`. For example: _otel_config.yaml:_ ```yaml receivers: statsd: endpoint: "0.0.0.0:8125" enable_metric_type: true is_monotonic_counter: false aggregation_interval: 10s enable_ip_only_aggregation: true # <-- enable ip only aggregation timer_histogram_mapping: - statsd_type: "timing" observer_type: "histogram" histogram: max_size: 50 exporters: debug: verbosity: detailed service: pipelines: metrics: receivers: - statsd exporters: - debug ``` _run:_ ```sh STATSD_HOST="localhost" STATSD_PORT=8125 for port in {10000..10010}; do echo -n "my.metric:1|c" | nc -w 1 -u -p $port ${STATSD_HOST} ${STATSD_PORT} echo "Sent from port $port" done ``` _result:_ ``` 2024-08-26T23:36:00.224+0200 info ResourceMetrics #0 Resource SchemaURL: ScopeMetrics #0 ScopeMetrics SchemaURL: InstrumentationScope otelcol/statsdreceiver 0.103.0-dev Metric #0 Descriptor: -> Name: -n my.metric -> Description: -> Unit: -> DataType: Sum -> IsMonotonic: false -> AggregationTemporality: Delta NumberDataPoints #0 Data point attributes: -> metric_type: Str(counter) StartTimestamp: 2024-08-26 21:35:50.223101 +0000 UTC Timestamp: 2024-08-26 21:36:00.224252 +0000 UTC Value: 7 {"kind": "exporter", "data_type": "metrics", "name": "debug"} 2024-08-26T23:36:10.224+0200 info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 1, "data points": 1} 2024-08-26T23:36:10.224+0200 info ResourceMetrics #0 Resource SchemaURL: ScopeMetrics #0 ScopeMetrics SchemaURL: InstrumentationScope otelcol/statsdreceiver 0.103.0-dev Metric #0 Descriptor: -> Name: -n my.metric -> Description: -> Unit: -> DataType: Sum -> IsMonotonic: false -> AggregationTemporality: Delta NumberDataPoints #0 Data point attributes: -> metric_type: Str(counter) StartTimestamp: 2024-08-26 21:36:00.224252 +0000 UTC Timestamp: 2024-08-26 21:36:10.224607 +0000 UTC Value: 4 {"kind": "exporter", "data_type": "metrics", "name": "debug"} ``` Instead of generating 11 metrics for each port that was used to send, only 2 metrics are blocks are returned, who's values total 11. ![2024-08-26 23 44 15](https://github.com/user-attachments/assets/6b8a89d1-186e-4257-9c82-90c5f9d14f98) **Link to tracking Issue:** #23809 **Testing:** - [x] Added unit tests **Documentation:** <Describe the documentation added.> - [x] Added information to the statsdreceiver `README.md` describing the option. --------- Co-authored-by: Povilas Versockas <povilas.versockas@coralogix.com>
…34851) **Description:** <Describe what has changed.> The `statsdreceiver` only aggregates metrics on `protocol+host+ip`, this leads to issues or inconsistencies when dealing with clients that constantly switch tcp/udp ports. To address the issue, this PR adds a configuration option `enableIPOnlyAggregation` that allows the use to specify if they want to aggregate on the `IP` instead of `IP+Port`. For example: _otel_config.yaml:_ ```yaml receivers: statsd: endpoint: "0.0.0.0:8125" enable_metric_type: true is_monotonic_counter: false aggregation_interval: 10s enable_ip_only_aggregation: true # <-- enable ip only aggregation timer_histogram_mapping: - statsd_type: "timing" observer_type: "histogram" histogram: max_size: 50 exporters: debug: verbosity: detailed service: pipelines: metrics: receivers: - statsd exporters: - debug ``` _run:_ ```sh STATSD_HOST="localhost" STATSD_PORT=8125 for port in {10000..10010}; do echo -n "my.metric:1|c" | nc -w 1 -u -p $port ${STATSD_HOST} ${STATSD_PORT} echo "Sent from port $port" done ``` _result:_ ``` 2024-08-26T23:36:00.224+0200 info ResourceMetrics #0 Resource SchemaURL: ScopeMetrics #0 ScopeMetrics SchemaURL: InstrumentationScope otelcol/statsdreceiver 0.103.0-dev Metric #0 Descriptor: -> Name: -n my.metric -> Description: -> Unit: -> DataType: Sum -> IsMonotonic: false -> AggregationTemporality: Delta NumberDataPoints #0 Data point attributes: -> metric_type: Str(counter) StartTimestamp: 2024-08-26 21:35:50.223101 +0000 UTC Timestamp: 2024-08-26 21:36:00.224252 +0000 UTC Value: 7 {"kind": "exporter", "data_type": "metrics", "name": "debug"} 2024-08-26T23:36:10.224+0200 info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 1, "data points": 1} 2024-08-26T23:36:10.224+0200 info ResourceMetrics #0 Resource SchemaURL: ScopeMetrics #0 ScopeMetrics SchemaURL: InstrumentationScope otelcol/statsdreceiver 0.103.0-dev Metric #0 Descriptor: -> Name: -n my.metric -> Description: -> Unit: -> DataType: Sum -> IsMonotonic: false -> AggregationTemporality: Delta NumberDataPoints #0 Data point attributes: -> metric_type: Str(counter) StartTimestamp: 2024-08-26 21:36:00.224252 +0000 UTC Timestamp: 2024-08-26 21:36:10.224607 +0000 UTC Value: 4 {"kind": "exporter", "data_type": "metrics", "name": "debug"} ``` Instead of generating 11 metrics for each port that was used to send, only 2 metrics are blocks are returned, who's values total 11. ![2024-08-26 23 44 15](https://github.com/user-attachments/assets/6b8a89d1-186e-4257-9c82-90c5f9d14f98) **Link to tracking Issue:** open-telemetry#23809 **Testing:** - [x] Added unit tests **Documentation:** <Describe the documentation added.> - [x] Added information to the statsdreceiver `README.md` describing the option. --------- Co-authored-by: Povilas Versockas <povilas.versockas@coralogix.com>
Component(s)
receiver/statsd
What happened?
Description
The metrics reported using the
statsd
receiver are not correctly aggregated when they come from different client addresses and the metrics are not decorated with any attribute to identify the client. Instead of aggregating the metrics with the same type and the same set of attribute keys and values as described in the docs, the receiver aggregates the metrics with the same type and the same set of attribute keys and values, and also the same origin address. But it does not include any attribute to identify the origin address.As I see it, this is a common scenario, where metrics coming from different producers (or even the same producer but using different connections), do not include specific attributes to identify the producer.
If the
k8sattributesprocessor
is used to decorate the metrics, this is not an issue since metrics will have attributes identifying the origin but using the processor is not an option for every environment neither is mandatory when using the receiver.It is also possible to avoid this problem by decorating the metrics in the client, but as stated in #15290, that can be cumbersome or just not possible.
Steps to Reproduce
statsd
receiver and thefile
exporter.Expected Result
The metric is expected to be aggregated, I'd expect getting a value of
30
with an attributemykey:myVal
fortest.metric
.Actual Result
I get two values of
test.metric
with the same set attributes (myKey:myVal
) and the same timestamp.Sample output from
/tmp/collector-output.json
:Collector version
v0.80.0
Environment information
No response
OpenTelemetry Collector configuration
Log output
No response
Additional context
Possible cause
This can be reproduced using
netcat
because it will close the UDP connection right after writing the message. Therefore, different origin ports are used for each metric, unless the port is specified. As a result, metrics are being aggregated separately here and different batches are obtained here.If we use the same port to write the metrics:
We get the expected results:
Considered solutions
Always decorate metrics with the producer's address. This way, there would be an attribute identifying the metrics's origin and therefore we wouldn't be reporting several values for the same metric and the same set of attributes for a specific timestamp. This approach would be really simple to implement but it would increase the weight of every metric reported (even if the origin's address is not an interesting attribute for the user's use case). As I see it, it would introduce a breaking change and we should avoid it if possible.
Take a different approach to identify the producer's address from metrics so the metrics can still be decorated using processors relying on the producer's address, such as the
k8sattributesprocessor
. Metrics could be optionally decorated (that could be controlled using a configuration flag) with the producer's address and we could avoid grouping them while performing the aggregation. This approach would require a different configuration in the processor to makestatsdreceiver
andk8sattributesprocessor
work together.If the bug is confirmed, we will be willing to help to solve it.
The text was updated successfully, but these errors were encountered: