Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gce-pd and NFS mount metrics disappear and require node_exporter container restart #959

Closed
tmegow opened this issue May 29, 2018 · 4 comments

Comments

@tmegow
Copy link

tmegow commented May 29, 2018

Host operating system: output of uname -a

Linux node-prom-0lv0 4.4.86+ #1 SMP Thu Jan 18 17:03:26 PST 2018 x86_64 Intel(R) Xeon(R) CPU @ 2.30GHz GenuineIntel GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 0.16.0 (branch: HEAD, revision: d42bd70f4363dced6b77d8fc311ea57b63387e4f)
  build user:       root@a67a9bc13a69
  build date:       20180515-15:52:42
  go version:       go1.9.6

node_exporter command line flags

        --path.procfs /host/proc
        --path.sysfs  /host/sys
        --collector.filesystem.ignored-mount-points ^/(sys|proc|dev|host|etc)($|/)

Are you running node_exporter in Docker?

Yes

What did you do that produced an error?

No error to mention, however we notice these 2 symptoms:

  1. when pods with gce-pds/nfs mounts are rescheduled, the node_exporter does not begin exporting the new mount.
  2. despite no pods being rescheduled, gce-pd and nfs mounts disappear from the node_exporter metrics eventually

In both of these cases, restarting the node_exporter is temporary resolution until the issue happens again.

A temporary bandaid has been to regularly restart all our node_exporters, is this a recommended pattern when running node_exporter?

What did you expect to see?

We hoped the upgrade to v0.16.0 would fix this symptom we've been seeing in v0.14.0 and v0.15.0.

# lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda       8:0    0   250G  0 disk
├─sda1    8:1    0 245.9G  0 part /mnt/stateful_partition
├─sda2    8:2    0    16M  0 part
├─sda3    8:3    0     2G  0 part
├─sda4    8:4    0    16M  0 part
├─sda5    8:5    0     2G  0 part
├─sda6    8:6    0   512B  0 part
├─sda7    8:7    0   512B  0 part
├─sda8    8:8    0    16M  0 part /usr/share/oem
├─sda9    8:9    0   512B  0 part
├─sda10   8:10   0   512B  0 part
├─sda11   8:11   0     8M  0 part
└─sda12   8:12   0    32M  0 part
sdb       8:16   0   1.4T  0 disk /home/kubernetes/containerized_mounter/rootfs/

# curl -s localhost:9100/metrics | grep node-dynamic-pvc
node_filesystem_avail_bytes{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 1.26032955392e+12
node_filesystem_avail_bytes{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 1.26032955392e+12
node_filesystem_device_error{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 0
node_filesystem_device_error{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 0
node_filesystem_files{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 9.1553792e+07
node_filesystem_files{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 9.1553792e+07
node_filesystem_files_free{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 9.155349e+07
node_filesystem_files_free{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 9.155349e+07
node_filesystem_free_bytes{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 1.335347195904e+12
node_filesystem_free_bytes{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 1.335347195904e+12
node_filesystem_readonly{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 0
node_filesystem_readonly{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 0
node_filesystem_size_bytes{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 1.475400564736e+12
node_filesystem_size_bytes{device="/dev/sdb",fstype="ext4",mountpoint="/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a"} 1.475400564736e+12

and

# mount | grep nfs
10.0.xx.xx:/ on /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/0947fe21-6379-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.130.xx.xx,local_lock=none,addr=10.0.xx.xx)
10.0.xx.xx:/ on /var/lib/kubelet/pods/0947fe21-6379-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.130.xx.xx,local_lock=none,addr=10.0.xx.xx)
10.0.xx.xx:/ on /var/lib/kubelet/pods/0947fe21-6379-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.130.xx.xx,local_lock=none,addr=10.0.xx.xx)
10.0.xx.xx:/ on /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/0947fe21-6379-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.130.xx.xx,local_lock=none,addr=10.0.xx.xx)

# curl -s localhost:9100/metrics | grep nfs
node_filesystem_avail{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 4.02243190784e+11
node_filesystem_avail{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 4.02243190784e+11
node_filesystem_device_error{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 0
node_filesystem_device_error{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 0
node_filesystem_files{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 2.62144e+07
node_filesystem_files{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 2.62144e+07
node_filesystem_files_free{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 2.6214388e+07
node_filesystem_files_free{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 2.6214388e+07
node_filesystem_free{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 4.21608292352e+11
node_filesystem_free{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 4.21608292352e+11
node_filesystem_readonly{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 0
node_filesystem_readonly{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 0
node_filesystem_size{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 4.21682741248e+11
node_filesystem_size{device="10.0.xx.xx:/",fstype="nfs4",mountpoint="/rootfs/var/lib/kubelet/pods/2640b993-6360-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs"} 4.21682741248e+11

What did you see instead?

# lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda       8:0    0   250G  0 disk
├─sda1    8:1    0 245.9G  0 part /mnt/stateful_partition
├─sda2    8:2    0    16M  0 part
├─sda3    8:3    0     2G  0 part
├─sda4    8:4    0    16M  0 part
├─sda5    8:5    0     2G  0 part
├─sda6    8:6    0   512B  0 part
├─sda7    8:7    0   512B  0 part
├─sda8    8:8    0    16M  0 part /usr/share/oem
├─sda9    8:9    0   512B  0 part
├─sda10   8:10   0   512B  0 part
├─sda11   8:11   0     8M  0 part
└─sda12   8:12   0    32M  0 part
sdb       8:16   0   1.4T  0 disk /home/kubernetes/containerized_mounter/rootfs/

# curl -s localhost:9100/metrics | grep node-dynamic-pvc
#

and

# mount | grep nfs
10.0.xx.xx:/ on /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/0947fe21-6379-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.130.xx.xx,local_lock=none,addr=10.0.xx.xx)
10.0.xx.xx:/ on /var/lib/kubelet/pods/0947fe21-6379-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.130.xx.xx,local_lock=none,addr=10.0.xx.xx)
10.0.xx.xx:/ on /var/lib/kubelet/pods/0947fe21-6379-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.130.xx.xx,local_lock=none,addr=10.0.xx.xx)
10.0.xx.xx:/ on /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/0947fe21-6379-11e8-99c3-42010af00019/volumes/kubernetes.io~nfs/important-nfs type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.130.xx.xx,local_lock=none,addr=10.0.xx.xx)

# curl -s localhost:9100/metrics | grep nfs
#
@discordianfish
Copy link
Member

That sounds strange. The node-exporter gets the mountpoints on each scrape from /proc/mounts.

Can you paste how /proc/mounts look like when the disks are still mounted but don't show up in the node-exporter?

@discordianfish
Copy link
Member

I've just realized, maybe /proc/mounts doesn't get updated in a running container or something like that?

@tmegow
Copy link
Author

tmegow commented May 30, 2018

I've just realized, maybe /proc/mounts doesn't get updated in a running container or something like that?

You nailed it. When the pod is rescheduled entries are removed from /proc/mount (the /dev/sdb /rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/.* entries), then after restarting, replacement lines are not added to /proc/mounts (or /host/proc/mounts) in the container.

# diff <(grep "sdb" ~/tmp/proc_mount_on_host_before.txt) <(grep "sdb" ~/tmp/proc_mount_in_container_before.txt)
1,8c1,8
< /dev/sdb /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/c6d0cf1c-6432-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/c6d0cf1c-6432-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /var/lib/kubelet/pods/c6d0cf1c-6432-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /var/lib/kubelet/pods/c6d0cf1c-6432-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
---
> /dev/sdb /rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
> /dev/sdb /rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
> /dev/sdb /rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/c6d0cf1c-6432-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
> /dev/sdb /rootfs/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/c6d0cf1c-6432-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
> /dev/sdb /rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
> /dev/sdb /rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
> /dev/sdb /rootfs/var/lib/kubelet/pods/c6d0cf1c-6432-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
> /dev/sdb /rootfs/var/lib/kubelet/pods/c6d0cf1c-6432-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0



#diff <(grep "sdb" ~/tmp/proc_mount_on_host_after.txt) <(grep "sdb" ~/tmp/proc_mount_in_container_after.txt)
1,8d0
< /dev/sdb /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/d952cbe6-6434-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/d952cbe6-6434-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/node-dynamic-pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /var/lib/kubelet/pods/d952cbe6-6434-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0
< /dev/sdb /var/lib/kubelet/pods/d952cbe6-6434-11e8-91ca-42010af0000a/volumes/kubernetes.io~gce-pd/pvc-686eea8f-2d46-11e8-ae48-42010af0000a ext4 rw,relatime,data=ordered 0 0

🤔

@tmegow tmegow changed the title Gce-pd and NFS mount metrics disappear and require node_exporter restart Gce-pd and NFS mount metrics disappear and require node_exporter container restart May 31, 2018
@discordianfish
Copy link
Member

So this is a dup of #502 and the general issue about running in docker is now discussed in #66, so let's move the discussion there. Nothing we can easily fix here anyway :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants