Description
When using cAdvisor to monitor Docker container stats, cAdvisor seems to get confused by any cgroups that happen to contain an existing Docker container ID in their basename. This includes the systemd mount unit for the /var/lib/docker/containers/*/shm
mountpoint, which seems to randomly result in cAdvisor returning incorrect (all-zero) stats from the /api/v1.2/docker/...
endpoint.
Symptoms
The /api/v1.2/docker
endpoint returns a mixture of {"/docker/*: ..., "/system.slice/var-lib-docker-containers-*-shm.mount" ...}
entries. The /api/v1.2/docker/*
endpoint may occasionally return zero-valued CPU and memory stats.
Details
With CoreOS 1235.6.0 + systemd 231 + Docker 1.12.3 + Linux 4.7.3-coreos-r2, the cAdvisor Docker driver seems to pick up two cgroups for each running Docker container:
I0113 12:22:45.621838 1 factory.go:111] Using factory "docker" for container "/system.slice/var-lib-docker-containers-acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536-shm.mount"
I0113 12:22:45.626448 1 manager.go:874] Added container: "/system.slice/var-lib-docker-containers-acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536-shm.mount" (aliases: [null-stackless-whoami-1 acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536], namespace: "docker")
I0113 12:22:45.755197 1 factory.go:111] Using factory "docker" for container "/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536"
I0113 12:22:45.756418 1 manager.go:874] Added container: "/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536" (aliases: [null-stackless-whoami-1 acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536], namespace: "docker")
In this case, /docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536
is the correct cgroup, which contains the processes within the Docker container. The /system.slice/var-lib-docker-containers-acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536-shm.mount
cgroup is empty, and associated with the /var/lib/docker/containers/*/shm
mountpoint.
Both of these container entries will associate themselves with the information returned by the Docker API for that container ID, and will thus have an identical set of cAdvisor aliases. Both container entries will add identical namespacedContainerName{"docker", "acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536"}
entries in the github.com/google/cadvisor/manager:manager.containers
map, which override each other depending on the order that the cgroups are listed in, and may change as events come in?
The /api/v1.2/docker/ID
will then semi-randomly return one of these two cgroup entries:
core@core-01 ~ $ curl -v http://localhost:8989/api/v1.2/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536
{"/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536":{"id":"acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536","name":"/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536","aliases":["null-stackless-whoami-1","acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536"],"namespace":"docker", ...
core@core-01 ~ $ docker restart kontena-cadvisor
kontena-cadvisor
core@core-01 ~ $ curl http://localhost:8989/api/v1.2/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536
{"/system.slice/var-lib-docker-containers-acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536-shm.mount":{"id":"acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536","name":"/system.slice/var-lib-docker-containers-acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536-shm.mount","aliases":["null-stackless-whoami-1","acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536"],"namespace":"docker", ...
The /docker/...
one has useful CPU/memory stats, but the /system.slice/var-lib-docker-containers-...-shm.mount
one reports zero for all CPU and memory usage figures. This is probably because the /system.slice/....mount
cgroup is empty, and does not contain any processes.
"stats" : [
{
"cpu" : {
"usage" : {
"total" : 0,
"user" : 0,
"system" : 0,
"per_cpu_usage" : [
0
]
},
"cfs" : {
"throttled_time" : 0,
"periods" : 0,
"throttled_periods" : 0
},
"load_average" : 0
},
"memory" : {
"cache" : 0,
"hierarchical_data" : {
"pgfault" : 0,
"pgmajfault" : 0
},
"failcnt" : 0,
"container_data" : {
"pgmajfault" : 0,
"pgfault" : 0
},
"swap" : 0,
"usage" : 0,
"working_set" : 0,
"rss" : 0
},
It seems like cAdvisor has a systemd factory which is intended to ignore the systemd mount
cgroups: https://github.com/google/cadvisor/blob/v0.24.1/container/systemd/factory.go#L42
However, the manager seems to register this systemd factory after the Docker factory, and thus the Docker factory will pick up the cgroup before the systemd factory has a chance to filter it out.
Workaround
The /api/v1.2/containers/docker/ID
API will always return the actual container cgroup stats, assuming that Docker places it's container cgroups directly under /docker/ID
. This is not always the case.
Related issues
- Kontena service CPU & Memory stats may be zero-valued due to cAdvisor cgroup confusion kontena/kontena#1656
- Cadivsor tries to monitor systemd .mount units via docker #1510
- cadvisor redundantly gathering docker stats on ubuntu 16.04 #1495
- cAdvisor can't collect the stats info from the containers of system.slice #1438
- cAdvisor should ignore .mount cgroups when on systemd #1211