- Prometheus version: 2.47.2
- Grafana version 10.2.0
List of provided dashboards:
- Confluent Platform overview
- Zookeeper cluster
- Kafka cluster
- Kafka topics
- Kafka clients
- Kafka quotas
- Kafka lag exporter
- Kafka transaction coordinator
- Schema Registry cluster
- Kafka Connect cluster
- ksqlDB cluster
- Kafka streams
- Kafka streams RocksDB
- Librdkafka based client
- Oracle CDC source Connector
- Debezium source Connectors
- Mongo source and sink Connectors
- Cluster Linking
- Rest Proxy
- KRaft overview
- Confluent RBAC
- Replicator
- Tiered Storage
Note
Consumer Group Lag
Starting with CP 7.5, brokers expose JMX tenant-metrics for consumer lags, see the documentation.
Consequently, you can either go with the kafka-lag-exporter or with the broker built-in tenant metrics.
For the later one, you need to enable it by setting confluent.consumer.lag.emitter.enabled = true
in the broker configuration, see the documentation.
This repository contains both options:
- Dedicated Kafka lag exporter dashboard
- Consumer lag visualizations within the consumer dashboard
[Experimental]
You can test JMX metrics using the UI and see if they are matching against a Prometheus ruleset file.
To run the UI:
- ensure you have Python 3.x install
- install python dependencies:
pip install Flask
- run the UI and then connect to localhost:5000
python shared-assets/jmx-exporter-matching-ui/app.py
- play with the UI
As an alternative, it is also available a definition file to collect only metrics with value at 99th percentile.
As an alternative, it is also available a definition file to collect only a limited number of metrics for clients clients - reduced.
For Kafka to output quota metrics, at least one quota configuration is necessary.
A quota can be configured from the cp-demo folder using docker-compose:
docker-compose exec kafka1 kafka-configs --bootstrap-server kafka1:12091 --alter --add-config 'producer_byte_rate=10000,consumer_byte_rate=30000,request_percentage=0.2' --entity-type users --entity-name unknown --entity-type clients --entity-name unknown
Demo is based on https://github.com/vdesabou/kafka-docker-playground/tree/master/connect/connect-cdc-oracle19-source
To test:
- From the repo, run playground example using option --enable-jmx-grafana
Demo is based on https://github.com/confluentinc/demo-scene/tree/master/cluster-linking-disaster-recovery
To test follow the next steps:
- Set env:
MONITORING_STACK=jmxexporter-prometheus-grafana
- Clone demo cluster linking disaster recovery from confluentinc/demo-scene:
[[ -d "clink-demo" ]] || git clone git@github.com:confluentinc/demo-scene.git clink-demo
(cd clink-demo && git fetch && git pull)
- Start the monitoring solution with the STACK selected. This command also starts clink-demo, you do not need to start clink-demo separately.
${MONITORING_STACK}/cluster-linking/start.sh
- Stop the monitoring solution. This command also stops clink-demo, you do not need to stop clink-demo separately.
${MONITORING_STACK}/cluster-linking/stop.sh
To test use dev-toolkit with Default profile:
- Start dev-toolkit with
$ cd dev-toolkit
$ start.sh
To test follow the next steps:
- Start dev-toolkit with replicator profile
$ cd dev-toolkit
$ start.sh --profile replicator