Skip to content

cAdvisor is confused by /system.slice/var-lib-docker-containers-...-shm.mount cgroups and may report zero-valued stats for Docker containers #1572

Closed
@SpComb

Description

@SpComb

When using cAdvisor to monitor Docker container stats, cAdvisor seems to get confused by any cgroups that happen to contain an existing Docker container ID in their basename. This includes the systemd mount unit for the /var/lib/docker/containers/*/shm mountpoint, which seems to randomly result in cAdvisor returning incorrect (all-zero) stats from the /api/v1.2/docker/... endpoint.

Symptoms

The /api/v1.2/docker endpoint returns a mixture of {"/docker/*: ..., "/system.slice/var-lib-docker-containers-*-shm.mount" ...} entries. The /api/v1.2/docker/* endpoint may occasionally return zero-valued CPU and memory stats.

Details

With CoreOS 1235.6.0 + systemd 231 + Docker 1.12.3 + Linux 4.7.3-coreos-r2, the cAdvisor Docker driver seems to pick up two cgroups for each running Docker container:

I0113 12:22:45.621838       1 factory.go:111] Using factory "docker" for container "/system.slice/var-lib-docker-containers-acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536-shm.mount"
I0113 12:22:45.626448       1 manager.go:874] Added container: "/system.slice/var-lib-docker-containers-acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536-shm.mount" (aliases: [null-stackless-whoami-1 acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536], namespace: "docker")

I0113 12:22:45.755197       1 factory.go:111] Using factory "docker" for container "/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536"
I0113 12:22:45.756418       1 manager.go:874] Added container: "/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536" (aliases: [null-stackless-whoami-1 acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536], namespace: "docker")

In this case, /docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536 is the correct cgroup, which contains the processes within the Docker container. The /system.slice/var-lib-docker-containers-acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536-shm.mount cgroup is empty, and associated with the /var/lib/docker/containers/*/shm mountpoint.

Both of these container entries will associate themselves with the information returned by the Docker API for that container ID, and will thus have an identical set of cAdvisor aliases. Both container entries will add identical namespacedContainerName{"docker", "acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536"} entries in the github.com/google/cadvisor/manager:manager.containers map, which override each other depending on the order that the cgroups are listed in, and may change as events come in?

The /api/v1.2/docker/ID will then semi-randomly return one of these two cgroup entries:

core@core-01 ~ $ curl -v http://localhost:8989/api/v1.2/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536
{"/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536":{"id":"acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536","name":"/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536","aliases":["null-stackless-whoami-1","acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536"],"namespace":"docker", ...
core@core-01 ~ $ docker restart kontena-cadvisor
kontena-cadvisor
core@core-01 ~ $ curl http://localhost:8989/api/v1.2/docker/acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536
{"/system.slice/var-lib-docker-containers-acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536-shm.mount":{"id":"acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536","name":"/system.slice/var-lib-docker-containers-acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536-shm.mount","aliases":["null-stackless-whoami-1","acf76f8a0cf47638f7ab7c7e033872672479017f7447f90d2d6d6d5c39bf7536"],"namespace":"docker", ...

The /docker/... one has useful CPU/memory stats, but the /system.slice/var-lib-docker-containers-...-shm.mount one reports zero for all CPU and memory usage figures. This is probably because the /system.slice/....mount cgroup is empty, and does not contain any processes.

      "stats" : [
         {
            "cpu" : {
               "usage" : {
                  "total" : 0,
                  "user" : 0,
                  "system" : 0,
                  "per_cpu_usage" : [
                     0
                  ]
               },
               "cfs" : {
                  "throttled_time" : 0,
                  "periods" : 0,
                  "throttled_periods" : 0
               },
               "load_average" : 0
            },
            "memory" : {
               "cache" : 0,
               "hierarchical_data" : {
                  "pgfault" : 0,
                  "pgmajfault" : 0
               },
               "failcnt" : 0,
               "container_data" : {
                  "pgmajfault" : 0,
                  "pgfault" : 0
               },
               "swap" : 0,
               "usage" : 0,
               "working_set" : 0,
               "rss" : 0
            },

It seems like cAdvisor has a systemd factory which is intended to ignore the systemd mount cgroups: https://github.com/google/cadvisor/blob/v0.24.1/container/systemd/factory.go#L42

However, the manager seems to register this systemd factory after the Docker factory, and thus the Docker factory will pick up the cgroup before the systemd factory has a chance to filter it out.

Workaround

The /api/v1.2/containers/docker/ID API will always return the actual container cgroup stats, assuming that Docker places it's container cgroups directly under /docker/ID. This is not always the case.

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions