|
| 1 | +--- |
| 2 | +title: Docker Swarm |
| 3 | +sort_rank: 1 |
| 4 | +--- |
| 5 | + |
| 6 | +# Docker Swarm |
| 7 | + |
| 8 | +Prometheus can discover targets in a [Docker Swarm][swarm] cluster, as of |
| 9 | +v2.20.0. This guide demonstrates how to use that service discovery mechanism. |
| 10 | + |
| 11 | +## Docker Swarm service discovery architecture |
| 12 | + |
| 13 | +The [Docker Swarm service discovery][swarmsd] contains 3 different roles: nodes, services, |
| 14 | +and tasks. |
| 15 | + |
| 16 | +The first role, **nodes**, represents the hosts that are part of the Swarm. It |
| 17 | +can be used to automatically monitor the Docker daemons or the Node Exporters |
| 18 | +who run on the Swarm hosts. |
| 19 | + |
| 20 | +The second role, **tasks**, represents any individual container deployed in the |
| 21 | +swarm. Each task gets its associated service labels. One service can be backed by |
| 22 | +one or multiple tasks. |
| 23 | + |
| 24 | +The third one, **services**, will discover the services deployed in the |
| 25 | +swarm. It will discover the ports exposed by the services. Usually you will want |
| 26 | +to use the tasks role instead of this one. |
| 27 | + |
| 28 | +Prometheus will only discover tasks and service that expose ports. |
| 29 | + |
| 30 | +NOTE: The rest of this post assumes that you have a Swarm running. |
| 31 | + |
| 32 | +## Setting up Prometheus |
| 33 | + |
| 34 | +For this guide, you need to [setup Prometheus][setup]. We will assume that |
| 35 | +Prometheus runs on a Docker Swarm manager node and has access to the Docker |
| 36 | +socket at `/var/run/docker.sock`. |
| 37 | + |
| 38 | +## Monitoring Docker daemons |
| 39 | + |
| 40 | +Let's dive into the service discovery itself. |
| 41 | + |
| 42 | +Docker itself, as a daemon, exposes [metrics][dockermetrics] that can be |
| 43 | +ingested by a Prometheus server. |
| 44 | + |
| 45 | +You can enable them by editing `/etc/docker/daemon.json` and setting the |
| 46 | +following properties: |
| 47 | + |
| 48 | +```json |
| 49 | +{ |
| 50 | + "metrics-addr" : "0.0.0.0:9323", |
| 51 | + "experimental" : true |
| 52 | +} |
| 53 | +``` |
| 54 | + |
| 55 | +Instead of `0.0.0.0`, you can set the IP of the Docker Swarm node. |
| 56 | + |
| 57 | +A restart of the daemon is required to take the new configuration into account. |
| 58 | + |
| 59 | +The [Docker documentation][dockermetrics] contains more info about this. |
| 60 | + |
| 61 | +Then, you can configure Prometheus to scrape the Docker daemon, by providing the |
| 62 | +following `prometheus.yml` file: |
| 63 | + |
| 64 | + |
| 65 | +```yaml |
| 66 | +scrape_configs: |
| 67 | + # Make Prometheus scrape itself for metrics. |
| 68 | + - job_name: 'prometheus' |
| 69 | + static_configs: |
| 70 | + - targets: ['localhost:9090'] |
| 71 | + |
| 72 | + # Create a job for Docker daemons. |
| 73 | + - job_name: 'docker' |
| 74 | + dockerswarm_sd_configs: |
| 75 | + - host: unix:///var/run/docker.sock |
| 76 | + role: nodes |
| 77 | + relabel_configs: |
| 78 | + # Fetch metrics on port 9323. |
| 79 | + - source_labels: [__meta_dockerswarm_node_address] |
| 80 | + target_label: __address__ |
| 81 | + replacement: $1:9323 |
| 82 | + # Set hostname as instance label |
| 83 | + - source_labels: [__meta_dockerswarm_node_hostname] |
| 84 | + target_label: instance |
| 85 | +``` |
| 86 | +
|
| 87 | +For the nodes role, you can also use the `port` parameter of |
| 88 | +`dockerswarm_sd_configs`. However, using `relabel_configs` is recommended as it |
| 89 | +enables Prometheus to reuse the same API calls across identical Docker Swarm |
| 90 | +configurations. |
| 91 | + |
| 92 | +## Monitoring Containers |
| 93 | + |
| 94 | +Let's now deploy a service in our Swarm. We will deploy [cadvisor][cad], which |
| 95 | +exposes container resources metrics: |
| 96 | + |
| 97 | +```shell |
| 98 | +docker service create --name cadvisor -l prometheus-job=cadvisor \ |
| 99 | + --mode=global --publish target=8080,mode=host \ |
| 100 | + --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock,ro \ |
| 101 | + --mount type=bind,src=/,dst=/rootfs,ro \ |
| 102 | + --mount type=bind,src=/var/run,dst=/var/run \ |
| 103 | + --mount type=bind,src=/sys,dst=/sys,ro \ |
| 104 | + --mount type=bind,src=/var/lib/docker,dst=/var/lib/docker,ro \ |
| 105 | + google/cadvisor -docker_only |
| 106 | +``` |
| 107 | + |
| 108 | +This is a minimal `prometheus.yml` file to monitor it: |
| 109 | + |
| 110 | +```yaml |
| 111 | +scrape_configs: |
| 112 | + # Make Prometheus scrape itself for metrics. |
| 113 | + - job_name: 'prometheus' |
| 114 | + static_configs: |
| 115 | + - targets: ['localhost:9090'] |
| 116 | +
|
| 117 | + # Create a job for Docker Swarm containers. |
| 118 | + - job_name: 'dockerswarm' |
| 119 | + dockerswarm_sd_configs: |
| 120 | + - host: unix:///var/run/docker.sock |
| 121 | + role: tasks |
| 122 | + relabel_configs: |
| 123 | + # Only keep containers that should be running. |
| 124 | + - source_labels: [__meta_dockerswarm_task_desired_state] |
| 125 | + regex: running |
| 126 | + action: keep |
| 127 | + # Only keep containers that have a `prometheus-job` label. |
| 128 | + - source_labels: [__meta_dockerswarm_service_label_prometheus_job] |
| 129 | + regex: .+ |
| 130 | + action: keep |
| 131 | + # Use the prometheus-job Swarm label as Prometheus job label. |
| 132 | + - source_labels: __meta_dockerswarm_service_label_prometheus_job |
| 133 | + target_label: job |
| 134 | +``` |
| 135 | +
|
| 136 | +Let's analyze each part of the [relabel configuration][rela]. |
| 137 | +
|
| 138 | +
|
| 139 | +```yaml |
| 140 | +- source_labels: [__meta_dockerswarm_task_desired_state] |
| 141 | + regex: running |
| 142 | + action: keep |
| 143 | +``` |
| 144 | +
|
| 145 | +Docker Swarm exposes the desired [state of the tasks][state] over the API. In |
| 146 | +out example, we only **keep** the targets that should be running. It prevents |
| 147 | +monitoring tasks that should be shut down. |
| 148 | +
|
| 149 | +```yaml |
| 150 | +- source_labels: [__meta_dockerswarm_service_label_prometheus_job] |
| 151 | + regex: .+ |
| 152 | + action: keep |
| 153 | +``` |
| 154 | +
|
| 155 | +When we deployed our cadvisor, we have added a label `prometheus-job=cadvisor`. |
| 156 | +As Prometheus fetches the tasks labels, we can instruct it to **only** keep the |
| 157 | +targets which have a `prometheus-job` label. |
| 158 | + |
| 159 | + |
| 160 | +```yaml |
| 161 | +- source_labels: __meta_dockerswarm_service_label_prometheus_job |
| 162 | + target_label: job |
| 163 | +``` |
| 164 | + |
| 165 | +That last part takes the label `prometheus-job` of the task and turns it into |
| 166 | +a target label, overwriting the default `dockerswarm` job label that comes from |
| 167 | +the scrape config. |
| 168 | + |
| 169 | +## Discovered labels |
| 170 | + |
| 171 | +The [Prometheus Documentation][swarmsd] contains the full list of labels, but |
| 172 | +here are other relabel configs that you might find useful. |
| 173 | + |
| 174 | +### Scraping metrics via a certain network only |
| 175 | + |
| 176 | +```yaml |
| 177 | +- source_labels: [__meta_dockerswarm_network_name] |
| 178 | + regex: ingress |
| 179 | + action: keep |
| 180 | +``` |
| 181 | + |
| 182 | +### Scraping global tasks only |
| 183 | + |
| 184 | +Global tasks run on every daemon. |
| 185 | + |
| 186 | +```yaml |
| 187 | +- source_labels: [__meta_dockerswarm_service_mode] |
| 188 | + regex: global |
| 189 | + action: keep |
| 190 | +- source_labels: [__meta_dockerswarm_task_port_publish_mode] |
| 191 | + regex: host |
| 192 | + action: keep |
| 193 | +``` |
| 194 | + |
| 195 | +### Adding a docker_node label to the targets |
| 196 | + |
| 197 | +```yaml |
| 198 | +- source_labels: [__meta_dockerswarm_node_hostname] |
| 199 | + target_label: docker_node |
| 200 | +``` |
| 201 | + |
| 202 | +## Connecting to the Docker Swarm |
| 203 | + |
| 204 | +The above `dockerswarm_sd_configs` entries have a field host: |
| 205 | + |
| 206 | +```yaml |
| 207 | +host: unix:///var/run/docker.sock |
| 208 | +``` |
| 209 | + |
| 210 | +That is using the Docker socket. Prometheus offers [additional configuration |
| 211 | +options][swarmsd] to connect to Swarm using HTTP and HTTPS, if you prefer that |
| 212 | +over the unix socket. |
| 213 | + |
| 214 | +## Conclusion |
| 215 | + |
| 216 | +There are many discovery labels you can play with to better determine which |
| 217 | +targets to monitor and how, for the tasks, there is more than 25 labels |
| 218 | +available. Don't hesitate to look at the "Service Discovery" page of your |
| 219 | +Prometheus server (under the "Status" menu) to see all the discovered labels. |
| 220 | + |
| 221 | +The service discovery makes no assumptions about your Swarm stack, in such a way |
| 222 | +that given proper configuration, this should be pluggable to any existing stack. |
| 223 | + |
| 224 | +[state]:https://docs.docker.com/engine/swarm/how-swarm-mode-works/swarm-task-states/ |
| 225 | +[rela]:https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config |
| 226 | +[swarm]:https://docs.docker.com/engine/swarm/ |
| 227 | +[swarmsd]:https://prometheus.io/docs/prometheus/latest/configuration/configuration/#dockerswarm_sd_config |
| 228 | +[dockermetrics]:https://docs.docker.com/config/daemon/prometheus/ |
| 229 | +[cad]:https://github.com/google/cadvisor |
| 230 | +[setup]:https://prometheus.io/docs/prometheus/latest/getting_started/ |
0 commit comments