Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric_relabel_configs drop action doesn't work #35720

Open
arthur-observe opened this issue Oct 9, 2024 · 8 comments
Open

metric_relabel_configs drop action doesn't work #35720

arthur-observe opened this issue Oct 9, 2024 · 8 comments
Assignees
Labels
bug Something isn't working receiver/prometheus Prometheus receiver Stale

Comments

@arthur-observe
Copy link

arthur-observe commented Oct 9, 2024

Component(s)

receiver/prometheus

What happened?

Description

I have a prometheus configured like this:

  prometheus/pod_metrics:
      config:
        scrape_configs:
        - job_name: pod-metrics
          scrape_interval: 10s
          honor_labels: true
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          # this is defaulted to keep so we start with everything
          - action: keep

          # Drop anything matching the configured namespace.
          - action: 'drop'
            source_labels: ['__meta_kubernetes_namespace']
            regex: (.*istio.*|.*ingress.*|kube-system)

          # Drop anything not matching the configured namespace.
          - action: 'keep'
            source_labels: ['__meta_kubernetes_namespace']
            regex: (default)
          # Maps all Kubernetes pod labels to Prometheus labels with the prefix removed (e.g., __meta_kubernetes_pod_label_app becomes app).
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)

          # adds new label
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace

          # adds new label
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name

          metric_relabel_configs:
            - action: drop
              regex: .*bucket
              source_labels:
                - __name__
            - action: keep
              regex: (.*)
              source_labels:
                - __name__

Namespace keep and drop rules seem to work fine but the metric_relabel_configs do not. I tested same on grafana agent and it works fine there.

Steps to Reproduce

Deployed this pod on my cluster which emits metrics as expected -

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: prometheus-example-app
  name: prometheus-example-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: prometheus-example-app
  template:
    metadata:
      labels:
        app.kubernetes.io/name: prometheus-example-app
      annotations:
        observeinc_com_scrape: 'true'
        observeinc_com_path: '/metrics'
        observeinc_com_port: '8080'
    spec:
      containers:
      - name: prometheus-example-app
        image: quay.io/brancz/prometheus-example-app:v0.3.0
        ports:
        - name: web
          containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-example-app-service
spec:
  selector:
    app.kubernetes.io/name: prometheus-example-app
  ports:
    - protocol: TCP
      port: 8080  # Exposed service port
      targetPort: 8080
      name: metrics
---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: caller-cronjob
spec:
  schedule: "*/1 * * * *"  # Runs every minute
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: caller
            image: curlimages/curl:latest  # A lightweight curl image
            env:
              - name: SLEEP_TIME
                value: "10"  # Sleep time in seconds
              - name: LOOP_COUNT
                value: "36"   # Number of iterations
            command:
              - /bin/sh
              - -c
              - |
                for i in $(seq 1 $LOOP_COUNT); do
                  curl http://prometheus-example-app-service:8080;  # Adjust the URL and port as necessary

                  # Second call on even numbers
                  if [ $((i % 2)) -eq 0 ]; then
                    curl http://prometheus-example-app-service:8080/err;  # Second target service
                    echo "Second call on even #$i made."
                  fi
                  sleep $SLEEP_TIME;
                done
          restartPolicy: OnFailure

Expected Result

For this configuration I would expect this metrics to not be scraped - http_request_duration_seconds_bucket

Actual Result

It gets scraped and sent.

Collector version

0.111.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")
Using latest contrib image on eks

OpenTelemetry Collector configuration

relay:
----
extensions:
  # https://github.com/open-telemetry/opentelemetry-helm-charts/issues/816
  # 0.0.0.0 is hack for ipv6 on eks clusters
  health_check:
    endpoint: "${env:MY_POD_IP}:13133"

exporters:
  debug/override:
      verbosity: detailed
      sampling_initial: 2
      sampling_thereafter: 1
  prometheusremotewrite:
      endpoint: "YOURS"
      headers:
          authorization: "YOURS"
      resource_to_telemetry_conversion:
          enabled: true # Convert resource attributes to metric labels
      send_metadata: true

receivers:
  # https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/k8sclusterreceiver/documentation.md
  k8s_cluster:
    collection_interval: 60s
    metadata_collection_interval: 5m
    auth_type: serviceAccount
    node_conditions_to_report:
    - Ready
    - MemoryPressure
    - DiskPressure
    allocatable_types_to_report:
    - cpu
    - memory
    - storage
    - ephemeral-storage
    # defaults and optional - https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/k8sclusterreceiver/documentation.md
    metrics:
      k8s.node.condition:
        enabled: true
  prometheus/pod_metrics:
      config:
        scrape_configs:
        - job_name: pod-metrics
          scrape_interval: 10s
          honor_labels: true
          kubernetes_sd_configs:
          - role: pod
          relabel_configs:
          # this is defaulted to keep so we start with everything
          - action: keep

          # Drop anything matching the configured namespace.
          - action: 'drop'
            source_labels: ['__meta_kubernetes_namespace']
            regex: (.*istio.*|.*ingress.*|kube-system)

          # Drop anything not matching the configured namespace.
          - action: 'keep'
            source_labels: ['__meta_kubernetes_namespace']
            regex: (default)

          # Drop endpoints without one of: a port name suffixed with the configured regex, or an explicit prometheus port annotation.
          - action: 'keep'
            source_labels: ['__meta_kubernetes_pod_container_port_name', '__meta_kubernetes_pod_annotation_prometheus_io_port']
            regex: '(.*metrics|web;|.*;\d+)'

          # Drop pods with phase Succeeded or Failed.
          - action: 'drop'
            regex: 'Succeeded|Failed'
            source_labels: ['__meta_kubernetes_pod_phase']


          ################################################################
          # Prometheus Configs
          # Drop anything annotated with 'prometheus.io.scrape=false'.
          - action: 'drop'
            regex: 'false'
            source_labels: ['__meta_kubernetes_pod_annotation_prometheus_io_scrape']

          # Allow pods to override the scrape scheme with 'prometheus.io.scheme=https'.
          - action: 'replace'
            regex: '(https?)'
            replacement: '$1'
            source_labels: ['__meta_kubernetes_pod_annotation_prometheus_io_scheme']
            target_label: '__scheme__'

          # Allow service to override the scrape path with 'prometheus.io.path=/other_metrics_path'.
          - action: 'replace'
            regex: '(.+)'
            replacement: '$1'
            source_labels: ['__meta_kubernetes_pod_annotation_prometheus_io_path']
            target_label: '__metrics_path__'

          # Allow services to override the scrape port with 'prometheus.io.port=1234'.
          - action: 'replace'
            regex: '(.+?)(\:\d+)?;(\d+)'
            replacement: '$1:$3'
            source_labels: ['__address__', '__meta_kubernetes_pod_annotation_prometheus_io_port']
            target_label: '__address__'

          ################################################################

          # Maps all Kubernetes pod labels to Prometheus labels with the prefix removed (e.g., __meta_kubernetes_pod_label_app becomes app).
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)

          # adds new label
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace

          # adds new label
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name

          metric_relabel_configs:
            - action: drop
              regex: .*bucket
              source_labels:
                - __name__
            - action: keep
              regex: (.*)
              source_labels:
                - __name__


  

processors:
  # https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md
  memory_limiter:
    # check_interval is the time between measurements of memory usage for the
    # purposes of avoiding going over the limits. Defaults to zero, so no
    # checks will be performed. Values below 1 second are not recommended since
    # it can result in unnecessary CPU consumption.
    check_interval: 5s
    # limit_percentage (default = 0): Maximum amount of total memory targeted to be allocated by the process heap.
    # This configuration is supported on Linux systems with cgroups and it's intended to be used in dynamic platforms like docker.
    # This option is used to calculate memory_limit from the total available memory.
    # For instance setting of 75% with the total memory of 1GiB will result in the limit of 750 MiB.
    # The fixed memory setting (limit_mib) takes precedence over the percentage configuration.
    limit_percentage: 75
    # spike_limit_percentage (default = 0): Maximum spike expected between the measurements of memory usage.
    # The value must be less than limit_percentage.
    # This option is used to calculate spike_limit_mib from the total available memory.
    # For instance setting of 25% with the total memory of 1GiB will result in the spike limit of 250MiB.
    # This option is intended to be used only with limit_percentage.
    spike_limit_percentage: 25
  batch:
    send_batch_size: 4096
    send_batch_max_size: 4096
  k8sattributes:
    extract:
      metadata:
      - k8s.namespace.name
      - k8s.deployment.name
      - k8s.replicaset.name
      - k8s.statefulset.name
      - k8s.daemonset.name
      - k8s.cronjob.name
      - k8s.job.name
      - k8s.node.name
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.cluster.uid
      - k8s.node.name
      - k8s.node.uid
    passthrough: false
    pod_association:
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.uid
    - sources:
      - from: connection
  attributes/observe_common:
    actions:
      - key: k8s.cluster.name
        action: insert
        value: ${env:CLUSTER_NAME}
      - key: k8s.cluster.uid
        action: insert
        value:  ${env:CLUSTER_UID}
        


  # attributes to append to objects
  attributes/debug_source_cluster_metrics:
    actions:
      - key: debug_source
        action: insert
        value: cluster_metrics
  attributes/debug_source_pod_metrics:
    actions:
      - key: debug_source
        action: insert
        value: pod_metrics

service:
  extensions: [health_check]
  pipelines:
      metrics:
        receivers: [k8s_cluster]
        processors: [memory_limiter, batch, k8sattributes, attributes/observe_common, attributes/debug_source_cluster_metrics]
        exporters: [prometheusremotewrite, debug/override]
      metrics/pod_metrics:
        receivers: [prometheus/pod_metrics]
        processors: [memory_limiter, batch, k8sattributes, attributes/observe_common, attributes/debug_source_pod_metrics]
        exporters: [prometheusremotewrite, debug/override]
      
  telemetry:
      metrics:
        level: normal
        address: ${env:MY_POD_IP}:8888
      logs:
        level: DEBUG
        encoding: console

Log output

No errors

Additional context

No response

@arthur-observe arthur-observe added bug Something isn't working needs triage New item requiring triage labels Oct 9, 2024
@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label Oct 9, 2024
Copy link
Contributor

github-actions bot commented Oct 9, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dashpole
Copy link
Contributor

dashpole commented Oct 9, 2024

Can you fix the formatting of the issue above? We pass through the configuration you provide to prometheus server code without modification, so it would be strange for behavior to differ between the Prometheus server and the collector. Can you reproduce the issue with the prometheus server?

@dashpole dashpole removed the needs triage New item requiring triage label Oct 9, 2024
@dashpole dashpole self-assigned this Oct 9, 2024
@arthur-observe
Copy link
Author

arthur-observe commented Oct 9, 2024

Can you fix the formatting of the issue above? We pass through the configuration you provide to prometheus server code without modification, so it would be strange for behavior to differ between the Prometheus server and the collector. Can you reproduce the issue with the prometheus server?

Sorry new to submitting issues here - I think I cleaned up formatting - any other issues with submission?

trying on astronomy shop - will post results

@dashpole
Copy link
Contributor

dashpole commented Oct 9, 2024

There is a bunch of yaml above that isn't wrapped in a yaml markdown block, and is hard to read

@arthur-observe
Copy link
Author

arthur-observe commented Oct 9, 2024

ok tried in astronomy shop using grafana to view and it doesn't seem to work there either - added metric_relabel_configs to values file

  serverFiles:
    prometheus.yml:
      scrape_configs:
        - job_name: 'otel-collector'
          honor_labels: true
          kubernetes_sd_configs:
            - role: pod
              namespaces:
                own_namespace: true
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_opentelemetry_community_demo]
              action: keep
              regex: true
          metric_relabel_configs:
            - action: drop
              regex: http_server_duration_seconds_bucket
              source_labels:
                - __name__
image

@NathanNam
Copy link

Any updates?

@dashpole
Copy link
Contributor

I don't see anything obviously wrong with the config, but I suspect the regex isn't working properly. Can you try

(.*)bucket instead of .*bucket? All of the examples I can find seem to use parenthesis around wildcards.

The only other thing that I would try is removing the unnecessary action: keep block.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/prometheus Prometheus receiver Stale
Projects
None yet
Development

No branches or pull requests

3 participants