[Bug] Example Metric Dashboards and rules use deprecated "kubernetes_pod_name" and "container_name" labels #3308

chgl · 2020-07-12T20:04:36Z

Describe the bug
The Grafana Dashboards and Prometheus AlertManager rules use the kubernetes_pod_name and container_name labels which were deprecated (see kubernetes/kubernetes#80376).

I am currently updating the naming for the strimzi-zookeeper, strimzi-kafka, strimzi-kafka-connect Grafana Dashboards while also migrating the stats panel to Grafana v7's newest version. Let me know if there's interest in a PR.

To Reproduce
Steps to reproduce the behavior:

Create a kind Kubernetes v1.18.2 cluster: kind create cluster
Install the prometheus operator: helm install prometheus stable/prometheus-operator
Install Strimzi: helm install strimzi strimzi/strimzi-kafka-operator
Create a Kafka Cluster via kafka.yaml
Create a service monitor via kafka-sm.yaml

Note that I have commented out the __meta_kubernetes_endpoints_name re-labeling as it would -for some reason- prevent metrics from being scraped at all.

Open the Prometheus Operator's default Grafana Installation
Import the Strimzi Kafka Dashboard from https://github.com/strimzi/strimzi-kafka-operator/blob/a040cf2f7cddcc82b0d1a731dc7bf59089537931/examples/metrics/grafana-dashboards/strimzi-kafka.json
Observe that the charts display 0 as a default value or "No Data"

Expected behavior
The charts included in the dashboard should display the expected metrics.

Environment (please complete the following information):

Strimzi version: 0.18.0
Installation method: Helm
Kubernetes cluster: Kubernetes Kind 1.18.2
Prometheus Operator: prometheus-operator-8.16.1 app version: 0.38.1

YAML files and logs

kafka.yaml:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: kafka-cluster
spec:
  kafkaExporter: {}
  kafka:
    version: 2.5.0
    replicas: 1
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      log.message.format.version: "2.5"
    storage:
      type: ephemeral
    listeners:
      plain: {}
      tls: {}
    metrics:
      # Inspired by config from Kafka 2.0.0 example rules:
      # https://github.com/prometheus/jmx_exporter/blob/master/example_configs/kafka-2_0_0.yml
      lowercaseOutputName: true
      rules:
        # Special cases and very specific rules
        - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), topic=(.+), partition=(.*)><>Value
          name: kafka_server_$1_$2
          type: GAUGE
          labels:
            clientId: "$3"
            topic: "$4"
            partition: "$5"
        - pattern: kafka.server<type=(.+), name=(.+), clientId=(.+), brokerHost=(.+), brokerPort=(.+)><>Value
          name: kafka_server_$1_$2
          type: GAUGE
          labels:
            clientId: "$3"
            broker: "$4:$5"
        # Some percent metrics use MeanRate attribute
        # Ex) kafka.server<type=(KafkaRequestHandlerPool), name=(RequestHandlerAvgIdlePercent)><>MeanRate
        - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>MeanRate
          name: kafka_$1_$2_$3_percent
          type: GAUGE
        # Generic gauges for percents
        - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*><>Value
          name: kafka_$1_$2_$3_percent
          type: GAUGE
        - pattern: kafka.(\w+)<type=(.+), name=(.+)Percent\w*, (.+)=(.+)><>Value
          name: kafka_$1_$2_$3_percent
          type: GAUGE
          labels:
            "$4": "$5"
        # Generic per-second counters with 0-2 key/value pairs
        - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+), (.+)=(.+)><>Count
          name: kafka_$1_$2_$3_total
          type: COUNTER
          labels:
            "$4": "$5"
            "$6": "$7"
        - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*, (.+)=(.+)><>Count
          name: kafka_$1_$2_$3_total
          type: COUNTER
          labels:
            "$4": "$5"
        - pattern: kafka.(\w+)<type=(.+), name=(.+)PerSec\w*><>Count
          name: kafka_$1_$2_$3_total
          type: COUNTER
        # Generic gauges with 0-2 key/value pairs
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Value
          name: kafka_$1_$2_$3
          type: GAUGE
          labels:
            "$4": "$5"
            "$6": "$7"
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Value
          name: kafka_$1_$2_$3
          type: GAUGE
          labels:
            "$4": "$5"
        - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Value
          name: kafka_$1_$2_$3
          type: GAUGE
        # Emulate Prometheus 'Summary' metrics for the exported 'Histogram's.
        # Note that these are missing the '_sum' metric!
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+), (.+)=(.+)><>Count
          name: kafka_$1_$2_$3_count
          type: COUNTER
          labels:
            "$4": "$5"
            "$6": "$7"
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*), (.+)=(.+)><>(\d+)thPercentile
          name: kafka_$1_$2_$3
          type: GAUGE
          labels:
            "$4": "$5"
            "$6": "$7"
            quantile: "0.$8"
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.+)><>Count
          name: kafka_$1_$2_$3_count
          type: COUNTER
          labels:
            "$4": "$5"
        - pattern: kafka.(\w+)<type=(.+), name=(.+), (.+)=(.*)><>(\d+)thPercentile
          name: kafka_$1_$2_$3
          type: GAUGE
          labels:
            "$4": "$5"
            quantile: "0.$6"
        - pattern: kafka.(\w+)<type=(.+), name=(.+)><>Count
          name: kafka_$1_$2_$3_count
          type: COUNTER
        - pattern: kafka.(\w+)<type=(.+), name=(.+)><>(\d+)thPercentile
          name: kafka_$1_$2_$3
          type: GAUGE
          labels:
            quantile: "0.$4"
  zookeeper:
    replicas: 3
    storage:
      type: ephemeral
    metrics:
      # Inspired by Zookeeper rules
      # https://github.com/prometheus/jmx_exporter/blob/master/example_configs/zookeeper.yaml
      lowercaseOutputName: true
      rules:
        # replicated Zookeeper
        - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+)><>(\\w+)"
          name: "zookeeper_$2"
          type: GAUGE
        - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+)><>(\\w+)"
          name: "zookeeper_$3"
          type: GAUGE
          labels:
            replicaId: "$2"
        - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(Packets\\w+)"
          name: "zookeeper_$4"
          type: COUNTER
          labels:
            replicaId: "$2"
            memberType: "$3"
        - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+)><>(\\w+)"
          name: "zookeeper_$4"
          type: GAUGE
          labels:
            replicaId: "$2"
            memberType: "$3"
        - pattern: "org.apache.ZooKeeperService<name0=ReplicatedServer_id(\\d+), name1=replica.(\\d+), name2=(\\w+), name3=(\\w+)><>(\\w+)"
          name: "zookeeper_$4_$5"
          type: GAUGE
          labels:
            replicaId: "$2"
            memberType: "$3"
        # standalone Zookeeper
        - pattern: "org.apache.ZooKeeperService<name0=StandaloneServer_port(\\d+)><>(\\w+)"
          type: GAUGE
          name: "zookeeper_$2"
        - pattern: "org.apache.ZooKeeperService<name0=StandaloneServer_port(\\d+), name1=InMemoryDataTree><>(\\w+)"
          type: GAUGE
          name: "zookeeper_$2"
  entityOperator:
    topicOperator: {}

kafka-sm.yaml:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kafka-cluster-service-monitor
  labels:
    release: prometheus
spec:
  selector:
    matchExpressions:
      - { key: strimzi.io/kind, operator: In, values: [Kafka, KafkaConnect] }
  namespaceSelector:
    matchNames:
      - default
  endpoints:
    - port: tcp-prometheus
      honorLabels: true
      interval: 10s
      scrapeTimeout: 10s
      path: /metrics
      scheme: http
      relabelings:
        # - sourceLabels: [__meta_kubernetes_endpoints_name]
        #   separator: ;
        #   regex: prometheus-kube-state-metrics
        #   replacement: $1
        #   action: keep
        - separator: ;
          regex: __meta_kubernetes_service_label_(.+)
          replacement: $1
          action: labelmap
        - sourceLabels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          targetLabel: namespace
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          targetLabel: kubernetes_namespace
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          targetLabel: kubernetes_name
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_pod_node_name]
          separator: ;
          regex: (.*)
          targetLabel: node_name
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_pod_host_ip]
          separator: ;
          regex: (.*)
          targetLabel: node_ip
          replacement: $1
          action: replace

    #### job_name: node-exporter
    - port: tcp-prometheus
      honorLabels: true
      interval: 10s
      scrapeTimeout: 10s
      path: /metrics
      scheme: http
      relabelings:
        - sourceLabels: [__meta_kubernetes_endpoints_name]
          separator: ;
          regex: prometheus-node-exporter
          replacement: $1
          action: keep
        - separator: ;
          regex: __meta_kubernetes_service_label_(.+)
          replacement: $1
          action: labelmap
        - sourceLabels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          targetLabel: namespace
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_namespace]
          separator: ;
          regex: (.*)
          targetLabel: kubernetes_namespace
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_service_name]
          separator: ;
          regex: (.*)
          targetLabel: kubernetes_name
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_pod_node_name]
          separator: ;
          regex: (.*)
          targetLabel: node_name
          replacement: $1
          action: replace
        - sourceLabels: [__meta_kubernetes_pod_host_ip]
          separator: ;
          regex: (.*)
          targetLabel: node_ip
          replacement: $1
          action: replace

The text was updated successfully, but these errors were encountered:

ppatierno · 2020-07-13T13:30:37Z

@chgl thanks for raising this, I was just opening an issue and a PR for this.
Actually, the removed label is pod_name (now it's pod) and not kubernetes_pod_name (this is the result of the relabelling of our Prometheus additional scrape configuration).
It's also related to a wrong metrics path. Now, cadvisor exposes metrics on /metrics/cadvisor.
Anyway, I am going to open the PR with changes on the additional scrape configuration and on all the dashboards using it.

chgl · 2020-07-13T14:32:46Z

Awesome! What's the motivation behind re-labelling the pod_name/pod to kubernetes_pod_name anyway? - I can see that robustness towards upstream metric changes is one, as you only have to modify the relabelling in one place.

adelcast · 2020-07-13T18:10:15Z

Just hit this on Friday...this is the cadvisor change that caused this issue: kubernetes/kubernetes#69099

chgl added the bug label Jul 12, 2020

ppatierno self-assigned this Jul 13, 2020

ppatierno mentioned this issue Jul 13, 2020

Fixed metrics path and deprecated labels for the cadvisor job #3312

Merged

2 tasks

ppatierno closed this as completed in #3312 Jul 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Example Metric Dashboards and rules use deprecated "kubernetes_pod_name" and "container_name" labels #3308

[Bug] Example Metric Dashboards and rules use deprecated "kubernetes_pod_name" and "container_name" labels #3308

chgl commented Jul 12, 2020

ppatierno commented Jul 13, 2020 •

edited

Loading

chgl commented Jul 13, 2020

adelcast commented Jul 13, 2020

[Bug] Example Metric Dashboards and rules use deprecated "kubernetes_pod_name" and "container_name" labels #3308

[Bug] Example Metric Dashboards and rules use deprecated "kubernetes_pod_name" and "container_name" labels #3308

Comments

chgl commented Jul 12, 2020

ppatierno commented Jul 13, 2020 • edited Loading

chgl commented Jul 13, 2020

adelcast commented Jul 13, 2020

ppatierno commented Jul 13, 2020 •

edited

Loading