regex wildcards via target allocator not matching #3051

fredrikgh · 2024-06-18T17:12:20Z

Component(s)

collector, target allocator

What happened?

Description

There appears to be discrepancy between how the otel-collector eventually filter Prometheus metrics based on a ServiceMonitor via Target allocator (TA), compared to a vanilla Prometheus instance. Truthfully, I don't know if this issue belongs here or in the collector, more in additional context below.

Steps to Reproduce

Set up a workload to monitor, in my case Mimir.
Set up a simple ServiceMonitor, and specify a drop action using wildcards e.g:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mimir-service-monitor
spec:
  endpoints:
    - interval: 30s
      port: http
      path: /metrics
      metricRelabelings:
        - action: drop
          sourceLabels: [__name__]
          regex: .*bucket.*
  selector:
    matchLabels:
      app.kubernetes.io/name: mimir

Set up two Prometheus instances:
A: being written to by an otel-collector fetching targets from above ServiceMonitor using TA.
B: fetching targets from the ServiceMonitor using the serviceMonitorSelector field of the Prometheus CRD, disabled remoteWriteReceiver.
Enter the Prometheus UI of both, and search for cortex_bucket.
Compare results.

Expected Result

This is the result I'm getting from B, i.e. no metrics containing bucket were fetched.

Actual Result

This is the result I'm getting from A, i.e. all metrics were fetched.

Kubernetes Version

v1.29.2

Operator version

v0.102.0

Collector version

v0.102.1

Environment information

Environment

OS: (e.g., "Ubuntu 22.04")

Log output

No response

Additional context

Some additional notes:

TA appears to respond with the correct values on /scrape_configs, so it may be that any error happens after that point.

We see other RegEx wildcards working, which is especially odd. E.g. this does correctly keep kube_daemonset.* metrics:

The text was updated successfully, but these errors were encountered:

fredrikgh · 2024-06-18T19:22:14Z

Okay, this was quite elusive but this comes down to what Prometheus and OTel considers a metric.

In Prometheus, you can filter the metric cortex_query_frontend_retries_bucket based on that name.

In OTel, cortex_query_frontend_retries_bucket is a datapoint in metric cortex_query_frontend_retries. This explains why bucket, specifically, didn't work. Found this by reading this thread. This is pretty flawed, since there's no reliable way of removing these metrics (although you can filter by type in OTel itself).

fredrikgh added bug Something isn't working needs triage labels Jun 18, 2024

fredrikgh closed this as completed Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regex wildcards via target allocator not matching #3051

regex wildcards via target allocator not matching #3051

fredrikgh commented Jun 18, 2024

fredrikgh commented Jun 18, 2024

regex wildcards via target allocator not matching #3051

regex wildcards via target allocator not matching #3051

Comments

fredrikgh commented Jun 18, 2024

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Kubernetes Version

Operator version

Collector version

Environment information

Environment

Log output

Additional context

fredrikgh commented Jun 18, 2024