Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way to distinguish bind mounted path ? #600

Open
keyolk opened this issue Jun 12, 2017 · 17 comments
Open

Way to distinguish bind mounted path ? #600

keyolk opened this issue Jun 12, 2017 · 17 comments

Comments

@keyolk
Copy link

keyolk commented Jun 12, 2017

Host operating system:

Linux css 4.4.68-nx #122 SMP Mon May 15 09:46:11 KST 2017 x86_64 GNU/Linux

node_exporter version:

  build user:       root@bb6d0678e7f3
  build date:       20170321-12:12:54
  go version:       go1.7.5

Are you running node_exporter in Docker?

yes

What did you do that produced an error?

With given query below

node_filesystem_size{instance=~"(css).*",fstype=~"(ext4|xfs)",mountpoint!~".*mapper.*",device!~".*mapper.*"}

Result is

node_filesystem_size{device="/dev/sda1",fstype="ext4",instance="css:9100",job="node",mountpoint="/rootfs"}	21003628544
node_filesystem_size{device="/dev/sda3",fstype="ext4",instance="css:9100",job="node",mountpoint="/rootfs/home"}	857421250560
node_filesystem_size{device="/dev/sda3",fstype="ext4",instance="css:9100",job="node",mountpoint="/rootfs/home1"}	857421250560

Actually second record is bind mounted point.
If I can get mount options it would be helpful, to exclude the record.

@SuperQ
Copy link
Member

SuperQ commented Jun 12, 2017

Can you attach a copy of /proc/mounts? This is where the exporter gets the filesystem list.

@keyolk
Copy link
Author

keyolk commented Jun 15, 2017

@SuperQ

/proc/mounts here

proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
devtmpfs /dev devtmpfs rw,relatime,size=24708776k,nr_inodes=6177194,mode=755 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
/dev/sda1 / ext4 rw,nodev,noatime,nobarrier,data=ordered 0 0
/dev/sda3 /home1 ext4 rw,nodev,noatime,nobarrier,data=ordered 0 0
/dev/sda3 /home ext4 rw,nodev,noatime,nobarrier,data=ordered 0 0
cgroup /cgroup/cpuset cgroup rw,relatime,cpuset 0 0
cgroup /cgroup/cpu cgroup rw,relatime,cpu 0 0
cgroup /cgroup/cpuacct cgroup rw,relatime,cpuacct 0 0
cgroup /cgroup/memory cgroup rw,relatime,memory 0 0
cgroup /cgroup/devices cgroup rw,relatime,devices 0 0
cgroup /cgroup/freezer cgroup rw,relatime,freezer 0 0
cgroup /cgroup/net_cls cgroup rw,relatime,net_cls 0 0
cgroup /cgroup/blkio cgroup rw,relatime,blkio 0 0
cgroup /cgroup/pids cgroup rw,relatime,pids 0 0

@SuperQ
Copy link
Member

SuperQ commented Jun 15, 2017

To me that looks like the mount options would be no help in this case. There is no way to tell the difference between {device="/dev/sda3",mountpoint="/home1"} and {device="/dev/sda3",mountpoint="/home"}

@keyolk
Copy link
Author

keyolk commented Jun 16, 2017

@SuperQ
Actually it is mounted like below

$ cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Tue Jun 21 16:50:34 2016
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=e4ebf103-b5b9-4620-a532-ccc7205f9eb2 /                       ext4    defaults,noatime,nodev,nobarrier        1 1
UUID=f860799f-1af0-4e16-ac4f-42a07cac8173 /home1                  ext4    defaults,noatime,nodev,nobarrier        1 2
UUID=b76e0523-3bae-412d-a06c-1ad53572aba4 swap                    swap    defaults        0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/home1  /home   none    default,bind    0       0

in terms of monitoring storage,
distinguising those two points are somewhat not good to me : (

@SuperQ
Copy link
Member

SuperQ commented Jun 16, 2017

The node_exporter does not read from /etc/fstab as it is not the authoritative source of information about what is mounted. Many systems use automatic mount management, hence the only source of what is mounted comes from /proc/mounts generated by the kernel.

Duplicate bind mounts are indistinguishable from the kernel's perspective, similar to a hard link.

There are two options:

  • Create inventory metrics based on /etc/fstab and expose them with the textfile interface.
  • Use a symlink instead of a bind mount.

@SuperQ SuperQ closed this as completed Jun 16, 2017
@marcan
Copy link

marcan commented Feb 2, 2018

There is a better source of information than /proc/mounts: /proc/self/mountinfo. That has added data as to what subdirectory from the device is mounted at the destination. For a bind mount of /data/shared/www into /var/www/shared, it looks like this:

34 21 253:4 / /data rw,noatime - xfs /dev/mapper/stor-data rw,attr2,inode64,logbufs=8,logbsize=64k,sunit=128,swidth=640,noquota
37 21 253:4 /shared/www /var/www/shared rw,noatime - xfs /dev/mapper/stor-data rw,attr2,inode64,logbufs=8,logbsize=64k,sunit=128,swidth=640,noquota

Perhaps the most prometheus-ish way to do this would be to just export this information (mountroot="/shared/www" for the second mount or similar). Then downstream rules can just choose to ignore any timeseries that don't have mountroot="/".

This won't help OP since they're bind-mounting the root of the filesystem (which truly is indistinguishable), but it will help those of us who bind-mount subtrees, which is very common (and having many random subtrees mounted is more common than having the root mounted many times).

Note that symlinks are usually an option for bind-mounting the root, but not for subtrees: one of the nice things about bind-mounting subtrees is that lets you bypass permissions checking for the parent directories at the source, which enables some interesting use cases that symlinks cannot provide.

@SuperQ
Copy link
Member

SuperQ commented Feb 2, 2018

@marcan That's a good idea. I think it's something we can implement.

@SuperQ SuperQ reopened this Feb 2, 2018
@brian-brazil
Copy link
Contributor

Perhaps the most prometheus-ish way to do this would be to just export this information (mountroot="/shared/www" for the second mount or similar).

I think we should be dropping such filesystems, as we already have the usage information from the actual filesystem mount. I'm not sure it's a good idea to add another label onto a key metric which already has more labels than it technically needs.

@SuperQ
Copy link
Member

SuperQ commented Feb 2, 2018

@brian-brazil I agree, we don't need them in the use metrics. We could include the bind mounting as a separate node_filesystem_mount_info mapping.

@marcan
Copy link

marcan commented Feb 2, 2018

The tricky bit is that it's possible to unmount the bare-root filesystem and leave the bind mount. At that point you'd have to implement deduplication in the mount list to make sure you don't drop any useful data. Perhaps this algorithm: for a given mounted device, prefer the mount with the least number of components in the mountroot, then among those prefer the oldest one (coming earlier in mountinfo). This approach would fix OP's problem.

@SuperQ
Copy link
Member

SuperQ commented Feb 2, 2018

@marcan I was considering deduplication by "first listed" in the mountinfo. This means that it's possible for labeling to shift. But I'm guessing the kernel data structure that holds mountinfo is populated in order by time. So "first" is original.

@brian-brazil
Copy link
Contributor

There's nothing saying you can't normally mount a filesystem twice, and I think in that case we'd want to expose both.

We could include the bind mounting as a separate node_filesystem_mount_info mapping.

I can imagine that getting high cardinality and high churn, and I'm not sure what it's gaining us.

@marcan
Copy link

marcan commented Feb 2, 2018

There's no way to distinguish a filesystem mounted twice from a filesystem mounted and then its root bindmounted elsewhere. As far as I know both of those result in identical kernel state.

Ultimately I think the options are: either show the first mount in mountinfo order, or show root mounts only (but what if a filesystem is only mounted from a subdirectory? then show that instead? what if it's mounted multiple times but never at the root?), or implement some kind of priority order and show the first mount only.

@discordianfish
Copy link
Member

No strong preference, but show the first mount in mountinfo order seems what you want in most cases. So let's go with this? Unless someone has objections.

@AndyFHAF

This comment has been minimized.

@anarcat
Copy link
Contributor

anarcat commented Nov 20, 2023

i have the impression that just reading /proc/self/mountinfo is sufficient here, why didn't we take this approach here?

@anarcat
Copy link
Contributor

anarcat commented Aug 28, 2024

i have the impression that just reading /proc/self/mountinfo is sufficient here, why didn't we take this approach here?

replying to myself, it seems like the plan is to add a new metric, node_filesystem_mount_info, that can be used to join on the existing metric to deduplicate things. I asked in the PR (#2970) how that might help here, but it's unclear to me if it's an actual fix or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants