Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gather storage drives health status #16

Merged
merged 2 commits into from
Mar 1, 2023

Conversation

mokrinsky
Copy link
Contributor

PR title kinda says for itself, also explained in Readme.

In short, sole reason for me to look for the idrac_exporter was the need of healthchecking A LOT OF DRIVES. But there was no such feature, so - casually coded.

Few things i have to mention just in case:

  1. I did not implement entire data models for what I added, i just used some fields that looked useful to me.
  2. All Dell servers I have available are equipped with just one RAID controller. I'm neither aware if there are any with several controllers nor I have them, so this PR will gather data from the first controller only.

@mrlhansen
Copy link
Owner

Hi @mokrinsky

Thanks for the PR it looks good! I will find some time to test it before merging.

@mrlhansen
Copy link
Owner

mrlhansen commented Mar 1, 2023

Hi @mokrinsky

It seems to work fine across multiple machines and vendors. There are a couple of things I would change.

  • Change metric name to idrac_drive_health to better reflect what the value represents.
  • I would also add status as a label for the metric. In case the status is not OK/Warning/Critical the state can be read from the label value.
  • Add a loop over controllers in RefreshStorage(). For NVMe in particular, each drive has its own controller.

You are welcome to make the changes if you want, otherwise I will make the adjustments after merging.

@mokrinsky
Copy link
Contributor Author

Hey! Thanks for your review.

  • I renamed metric.
  • I actually don't think it's necessary cause in Redfish specification there are only three statuses possible, but for the sake of possible weirdness in Dell hardware I regularly meet... Made this as well.
  • It would be great if you'll adjust it yourself cause, like I said, i got no servers with such configuration so I cannot test it myself.

@mrlhansen mrlhansen merged commit 29ec10d into mrlhansen:master Mar 1, 2023
@mokrinsky mokrinsky deleted the disk-status branch March 2, 2023 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants