Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_md_state did not capture "removed" state #2384

Open
levindecaro opened this issue May 26, 2022 · 2 comments
Open

node_md_state did not capture "removed" state #2384

levindecaro opened this issue May 26, 2022 · 2 comments
Labels

Comments

@levindecaro
Copy link

Host operating system: output of uname -a

Linux sds-3 4.18.0-305.7.1.el8_4.x86_64 #1 SMP Tue Jun 29 21:55:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 1.3.1 (branch: HEAD, revision: a2321e7)
build user: root@243aafa5525c
build date: 20211205-11:09:49
go version: go1.17.3
platform: linux/amd64

node_exporter command line flags

/usr/local/bin/node_exporter --path.procfs=/proc --path.sysfs=/sys --collector.filesystem.ignored-mount-points="^/(dev|proc|sys|var/lib/docker/.+)($|/)" --collector.filesystem.ignored-fs-types="^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$" --no-collector.wifi

Are you running node_exporter in Docker?

no

What did you do that produced an error?

mdadm -D output

/dev/md125:
           Version : 1.0
     Creation Time : Mon Jul  5 19:40:20 2021
        Raid Level : raid1
        Array Size : 614336 (599.94 MiB 629.08 MB)
     Used Dev Size : 614336 (599.94 MiB 629.08 MB)
      Raid Devices : 2
     Total Devices : 1
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Sun May 22 01:00:01 2022
             State : clean, degraded
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : bitmap

              Name : sds-3:boot_efi  (local to host sds-3)
              UUID : 312be27c:732e4a9e:6b279d78:10cd6a6a
            Events : 177

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       8       18        1      active sync   /dev/sdb2

What did you expect to see?

node_md_state{device="md125", instance="sds-3", job="sds-nodes", state="removed"}

What did you see instead?

"removed" state metric not yet implemented in node_md_state

@discordianfish
Copy link
Member

Yeah I see how this would be useful

@dswarbrick
Copy link
Contributor

@levindecaro Your expected metric would be inaccurate, because it's not the whole md125 array that has been removed, but rather just one of the component devices. From the output of your mdadm command, the md125 array is still functioning (and would continue to do so, since it's raid1 and still has one leg working).

What you instead need is a metric for the state of individual component devices if you want to see if they have been removed.

However, you could also have alerted on the condition that you encountered with a node_md_disks{state="failed"} > 0 alerting rule. Alternatively, node_md_disks_required - node_md_disks{state="active"} > 0 would probably also do the trick.

Having said that, the existing implementation of the procfs library's parsing of /proc/mdstat masks some of the low-level details and this is why I have proposed a new direction with prometheus/procfs#509.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants