-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter Processor fails to filter metric datapoints when cardinality is very high #31906
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This is very curious. The For now I am going to assume that the filterprocessor configuration is not working as you expect and the slow increase in cardinality over time is the result of data making it past the processor and slowly increase a unique count. I want to check that your OTTL statement is matching as expected. Can you add a debugexporter with |
Also you can make your filterprocessor more efficient by doing everything at the metrics level instead of datapoint: filter/by_service_name:
error_mode: ignore
metrics:
metric:
- 'name == "http_client_duration_bucket" and resource.attributes["service.name"] == "notification-sender"' |
Hello @TylerHelmuth I didn't have success setting up the debug exporter... Do I just need this?
? Or do I also need to set
? Where could I see the logs that the debug exporter outputs? |
exporters:
debug:
verbosity: detailed
service:
pipelines:
metrics/debug:
receivers: [otlp]
exporters: [debug] The metrics get printed to the collector logs |
I think I did manage to get a sample of the metrics:
Is this good to check that the OTTL statement is matching? |
The name of the metric is |
Oh, I see! Here's my understanding Prometheus represents histograms as multiple series, and changes '.' to ' _', so:
Becomes:
From my undestanding, theres no way to filter just one of the prometheus metrics. IE: keep http_client_duration_sum/count and just filtering http_client_duration_bucket, a metric that tends to get big due to the "le" dimension. Perhaps this filter could be a cool feature for the prometheus exporter? BTW, I went back to my original filter: @TylerHelmuth |
Heres the filter that worked for me:
|
Component(s)
processor/filter
What happened?
Description
When using Filter Processor to filter out datapoints, when the metric has a big cardinality, at first it filters out the data, but after some time it fails to filter
Steps to Reproduce
I want to filter the metric: http_client_duration_bucket{service_name="notification-sender"}
This metric has a LOT of cardinality. I calculated it to be > 500k
count(http_client_duration_bucket{service_name="notification-sender"}) > 500k
I'm using the .yml below (edited the relevant part for clarity):
Expected Result
I expected not see any
http_client_duration_bucket
with the labelservice_name="notification-sender"
Actual Result
At first, the data was being filtered, but after some time it seems the data is slowly being failed to filter.
I can see the cardinality for the metric is raising steadily. Below is the evidence I could gather:
Cardinality when filter was applied:
Cardinality 3 hours later:
Cardinality 6 hours later:
Cardinality 12 hours later:
This is a custom metric I have set up via a python script to measure the cardinality via the promethues endpoint (the script basicaly scrapes the promethues exporter endpoint, calculates cardinality for metric_name/service_name pairs, and sends custom metrics to the otel collector)
This is the overal scrape samples scraped promethues metric:
Collector version
otelcol-contrib version 0.96.0
Environment information
No response
OpenTelemetry Collector configuration
Log output
No response
Additional context
I understand that if it is related to the size of the load this issue may be hard to replicate.
This is a production environment, but I'm willing to run tests, or apply different suggested configurations if it helps.
The text was updated successfully, but these errors were encountered: