Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new ZFS pool io monitoring and the option to show ZFS pools only #385

Merged
merged 8 commits into from
Jul 11, 2022

Conversation

simplepad
Copy link
Contributor

This PR implements new ZFS pool io monitoring, compatible with newer versions of ZFS, where /proc/spl/kstat/zfs/*pool_name*/io files are not present anymore. Instead, it uses /proc/spl/kstat/zfs/*pool_name*/objset-* files, which allows for per dataset io stats.

Also, this PR adds the zfs_pools_only option, which if enabled will hide all datasets and only show pools in disk listings, and will count io stats differently.
In more detail:
When zfs_pools_only is set to true:

  • In disk listings, only ZFS pools are shown
  • IO stats are calculated per pool by combining the io of all objects in the pool (all objset-* files in the pool directory, pool root included)

When zfs_pools_only is set to false:

  • In disk listings, both ZFS pools and datasets are shown
  • IO stats are calculated per object, that is separately for ZFS pool root and each dataset (operations on datasets are not calculated towards pool root io)

Closes #383
Partially closes #380, pools health monitoring isn't implemented

simplepad added 3 commits July 1, 2022 19:30
…ions, now it uses files "/proc/spl/kstat/zfs/*pool_name*/objset*". Needs additional work to be compatible with the option "zfs_pools_only".
…pool's stat filepath points to the objset-* file when the option is disabled, otherwise it points to the pool's stats directory. Made ZFS total pool stat collection into a separate function for clean code. Also removed an unnecessary variable in the default ZFS stat collection, and changed io_ticks to track the number of reads/writes, reducing unnecessary calculations.
simplepad added 2 commits July 5, 2022 00:20
… declarations to btop_collect.cpp so they aren't included when compiling for macos and freebsd
@trentbuck
Copy link

FYI

I tested Debian 11 / Linux 5.16.0-0.bpo.4-amd64 / ZFS 2.1.4-1~bpo11+1.
With Use fstab = False,

  1. simplepad@4969dd8 shows I/O sparklines for ZFS datasets
  2. 6b1b9f8 does not show I/O sparklines for ZFS datasets

With Zfs pools only = True, my datasets disappear, but my pool doesn't appear.
I saw this on three hosts, all set up roughly along the lines of https://openzfs.github.io/openzfs-docs/Getting%20Started/Debian/Debian%20Bullseye%20Root%20on%20ZFS.html
They import the pool from inside the initramfs-tools initrd (before zed is running).
They run refind (not grub), and do not have a bpool. Instead /boot is rsynced into /boot/efi/EFI/debian via this crap https://github.com/trentbuck/flash-kernel-efi/tree/ansible

@simplepad
Copy link
Contributor Author

@trentbuck Does it show your pool when Zfs pools only = False?

@simplepad
Copy link
Contributor Author

Also, does anything come up in your ~/.config/btop/btop.log?

@trentbuck
Copy link

@simplepad I don't ever see the pool in btop; both zfs_pools_only = False and zfs_pools_only = True in ~/.config/btop/btop.conf just show/hide the datasets. Earlier, I wrongly thought it meant "show/hide the pool" rather than "show/hide datasets". I did not edit btop.conf directly; I used m > Options > mem from inside btop.

For the avoidance of doubt, I do see the pool in zpool list, and I see the datasets in zfs list.

My btop.log has some entries but they look boring.
btop.log

I tested both as my regular user, and as root - I did not see any difference in behaviour. My regular user normal Debian/GNOME defaults: sudo and polkit privileges, but no zfs-allow(8) privileges.
I can't think of anything else "weird" that would cause a pool to not show up.
I haven't actually straced btop or read the code, yet.

@trentbuck
Copy link

PS: here is grep -rnH ^ /proc/spl/kstat/zfs run as the same regular user that runs btop:
grep.typescript.txt
The pool's name is hera, and the objset files are clearly readable.

I wonder if the issue is that my pool's root dataset is not mounted?

$ zfs list
NAME                                    USED  AVAIL     REFER  MOUNTPOINT
hera                                    152G  72.5G      192K  none
hera/goll                              9.43G  6.57G     9.43G  /srv/backup/goll
hera/hera                               108G  72.5G     8.31G  /
hera/hera/home                         87.4G  72.5G      320K  /home
hera/hera/home/twb                     87.4G  12.9G     51.1G  /home/twb
hera/hera/var                          2.36G  72.5G      320K  /var
hera/hera/var/cache                    1.99G  72.5G     1.18G  /var/cache
hera/hera/var/tmp                       380M  7.71G      294M  /var/tmp
hera/not-backed-up                     34.7G  72.5G      192K  none
hera/not-backed-up/apt-cacher-ng       9.59G  2.41G     9.59G  /var/cache/apt-cacher-ng
hera/not-backed-up/tweak.prisonpc.com  25.1G  22.9G     4.60G  /var/lib/libvirt/images/twoke.d

@simplepad
Copy link
Contributor Author

simplepad commented Jul 6, 2022

@trentbuck Can you confirm that your pool appears when using btop 1.2.8? That way we can be sure that the issue is caused by my PR.

I wonder if the issue is that my pool's root dataset is not mounted?

That might be it. Btop is getting disks for listing from /etc/fstab, /etc/mtab, and /proc/self/mounts, so if it's not mounted, it won't be shown.

Earlier, I wrongly thought it meant "show/hide the pool" rather than "show/hide datasets".

Yeah, I think the option's name and description aren't good currently, I want to make it clearer before merging.

@trentbuck
Copy link

Ah, sorry - I was unclear again.
I have never seen btop show my pool on any host (I tried 3 different hosts, all Debian 11).
I (incorrectly?) thought that was expected on v1.2.8 because, prior to your PR, btop was looking in the wrong place?

With btop v1.2.8, I don't see "hera" (pool) listed under disks. I see datasets with only Used/Free (no I/O)
Screenshot from 2022-07-07 06-58-01 - btop v1 2 8
With your commit, I see I/O tickers, but still no pool
Screenshot from 2022-07-07 07-00-08 btop v1 2 8-5-g4969dd8

@trentbuck
Copy link

That might be it. Btop is getting disks for listing from /etc/fstab, /etc/mtab, and /proc/self/mounts, so if it's not mounted, it won't be shown.

I think I misunderstood this also. I was expecting "zfs_pools_only" to show me the same data as "zpool iostat 2", i.e. aggregate I/O data for the whole pool, regardless of whether any of the datasets are mounted.

For example, if I have four VMs (A B C D), and each one gets a zvol dataset that is handed directly to a VM (qemu -hda /dev/zvol/hera/A), I might not have any datasets mounted on the host OS. In that case I would still like/expect btop to show me I/O and capacity stats for either each zvol dataset (with zfs_pools_only=True, a la zfs list) or the aggregate for the whole pool (with zfs_pools_only=False, a la zpool list and zpool iostat 2).

Does that make sense?
Sorry if I am asking for too much!

@trentbuck
Copy link

PS: I have use_fstab = False because otherwise I don't see any ZFS datasets at all.
My fstab is only has legacy crap in it -- /boot (ext4), /boot/efi (vfat), &c.
All my ZFS mounts are specified as attributes of the ZFS datasets and get mounted automatically via /usr/lib/systemd/system-generators/zfs-mount-generator, which creates systemd .mount units and bypasses /etc/fstab entirely. They do show up in /proc/self/mountinfo and /proc/self/mounts. Per /lib/tmpfiles.d/debian.conf, /etc/mtab is a symlink to ../proc/self/mounts.

$ grep -c zfs /etc/fstab /etc/mtab /proc/self/mountinfo /proc/self/mounts
/etc/fstab:0
/etc/mtab:6
/proc/self/mountinfo:6
/proc/self/mounts:6

@simplepad
Copy link
Contributor Author

simplepad commented Jul 6, 2022

I get what you want, but currently btop acts like this: no mountpoint == disk won't be detected and won't be shown.

The pool's name is hera, and the objset files are clearly readable.

I looked closely to your output of grep -rnH ^ /proc/spl/kstat/zfs command, and I couldn't find the hera root object file in there.

I think your request is getting into the territory of #108, which is outside the scope of my PR.

…zfs_collect_pool_total_stats(), use try-catch to prevent possible crashes from int_64t conversions
@aristocratos
Copy link
Owner

@simplepad
Nice work!

Do you feel this PR is ready or do you wanna do more testing?

The logging functions should be ok to keep if you change them from warning to debug, that way we can still get some useful output if someone is having issues.
If you want to keep them as warnings I would ask that you set some limiter on them so we don't spam the logfile in case of errors.

@simplepad
Copy link
Contributor Author

@aristocratos

The logging functions should be ok to keep if you change them from warning to debug, that way we can still get some useful output if someone is having issues.

We could replace them with debug calls, but users won't know something is wrong until they change their log level. On the other hand, it's not like they would randomly want to check the log file, since as far as I know, btop has no error indication in the UI, so I'm not against hiding it behind debug.

Do you feel this PR is ready or do you wanna do more testing?

The only thing that's stopping me from marking this PR ready to review is that the zfs_pools_only option's name and description might be a little confusing and hard to understand, as shown by trentbuck in their comments above. Maybe we should change it to zfs_hide_datasets?
Also, this PR changes the ZFS io collection to be compatible with the new openzfs versions, but technically, if the older files we use in btop 1.2.8 are still present and the zfs_pools_only option is enabled, it's much faster to use them because then we only need to read one file (as opposed to the new version where we read all objects files that comprise the pool and calculate the sum of all of their stats). Should I keep the old way to read stats around for the odd occasion that someone with the older version of openzfs will be able to use it? I am not sure how much faster it is and how much of a benefit that is.

@trentbuck
Copy link

zfs_pools_only option's name and description might be a little confusing and hard to understand, as shown by trentbuck in their comments above. Maybe we should change it to zfs_hide_datasets?

+1 for zfs_hide_datasets, since that reflects the current semantics.
(Or, keep the variable name, but change the semantics to match :-)

Also, this PR changes the ZFS io collection to be compatible with the new openzfs versions, but technically, if the older files we use in btop 1.2.8 are still present and the zfs_pools_only option is enabled, it's much faster to use them because then we only need to read one file (as opposed to the new version where we read all objects files that comprise the pool and calculate the sum of all of their stats). Should I keep the old way to read stats around for the odd occasion that someone with the older version of openzfs will be able to use it? I am not sure how much faster it is and how much of a benefit that is.

Do you know what version of OpenZFS removed the "old way" files? From the below, I guess it's in 2.1.x.

  • Debian 11 / ZFS 2.1.4 --- only /proc/spl/kstat/zfs/*morpheus*/objset*
  • Debian 10 / ZFS 2.0.3 --- both
  • Debian 9 / ZFS 0.7.12 --- only /proc/spl/kstat/zfs/morpheus/io

@simplepad
Copy link
Contributor Author

@trentbuck

Do you know what version of OpenZFS removed the "old way" files? From the below, I guess it's in 2.1.x.

Well, here's the commit that removes it openzfs/zfs@371f88d, so yeah, probably 2.1.0

Debian 11 / ZFS 2.1.4 --- only /proc/spl/kstat/zfs/morpheus/objset*

Strange, I have a Debian 11 server with ZFS installed and the stable version there is 2.0.3

@aristocratos
Copy link
Owner

@simplepad

We could replace them with debug calls, but users won't know something is wrong until they change their log level. On the other hand, it's not like they would randomly want to check the log file, since as far as I know, btop has no error indication in the UI, so I'm not against hiding it behind debug.

Well, an indication would be that the zfs io isn't showing up leading to them hopefully creating a bug report. And then if they haven't provided a debug logfile we can always ask for one.

I've been thinking about adding some more functionality to the logger, for example optional parameters "log_identifier" and "max_reported". To easily be able to mute log entries of the same "log_identifier".
That way warnings can still be generated for non critical issues without filling up the logfile and adding unnecessary io operations and processing for those who choose to ignore them.

The only thing that's stopping me from marking this PR ready to review is that the zfs_pools_only option's name and description might be a little confusing and hard to understand, as shown by trentbuck in their comments above. Maybe we should change it to zfs_hide_datasets?

I haven't really used zfs so can't really tell what the best terminology for a regular user would be, but in general I would guess having "hide" in the name rather than "only" is probably better. There are several options already in the program that probably should be changed to better clarify what they actually do...

Also, this PR changes the ZFS io collection to be compatible with the new openzfs versions, but technically, if the older files we use in btop 1.2.8 are still present and the zfs_pools_only option is enabled, it's much faster to use them because then we only need to read one file (as opposed to the new version where we read all objects files that comprise the pool and calculate the sum of all of their stats). Should I keep the old way to read stats around for the odd occasion that someone with the older version of openzfs will be able to use it? I am not sure how much faster it is and how much of a benefit that is.

Unless we are talking 100+ files that are being read compared to the "old" way, I don't think will be noticeable performance wise. But I would say it's up to you if you wanna keep both ways.
You could do some simple testing with just the timer shown for "mem" in debug mode and see if you notice any discernable difference between the two ways and if it's worth keeping both.

@simplepad
Copy link
Contributor Author

Although I don't really have that many datasets I did some testing and there wasn't any noticeable difference between the two ways. After all, the files we're reading are stored in /proc, which is a virtual filesystem stored in memory, so the reads are probably fast enough that we can leave the code as is.

Marking this PR as Ready for review.

@simplepad simplepad marked this pull request as ready for review July 11, 2022 16:52
Copy link
Owner

@aristocratos aristocratos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work man!

And thank you for the continued contributions!

@aristocratos aristocratos merged commit b485964 into aristocratos:main Jul 11, 2022
@simplepad simplepad deleted the zfs-iostat-new branch July 11, 2022 19:16
@trentbuck
Copy link

Strange, I have a Debian 11 server with ZFS installed and the stable version there is 2.0.3

It's in bullseye-backports: https://packages.debian.org/search?keywords=zfs-dkms

You can put something like this in /etc/apt/preferences.d/zfs-is-cool:

Explanation: use Linux 5.14+ (not 5.10)
Package: src:linux src:linux-signed-amd64 src:zfs-linux
Pin: release a=bullseye-backports
Pin-Priority: 500

@trentbuck
Copy link

Unless we are talking 100+ files that are being read compared to the "old" way, I don't think will be noticeable performance wise.

FWIW, I think 10 is typical for a single-user laptop, and 100 to 1000 is typical for a multi-user server.

AFAICT you get one objset file for each mounted dataset. On a server with 20 users (thus at least 40 datasets mounted -- one for each $HOME and one for each $MAIL), I see:

cyber@heavy:~$ zfs list | wc -l
109
cyber@heavy:~$ mount -t zfs | wc -l
95
cyber@heavy:~$ ls -l /proc/spl/kstat/zfs/*/*objset* | wc -l
95

On a single-user laptop, those numbers are all 6.
On a backup server, 9 objsets/mounts, and 66 datasets (because ZFS doesn't need to decrypt or mount the incremental backups).

At the default refresh rate of 0.5Hz, I doubt 100 files is a big deal.
In any case, the "old way" will be gone from all ZFS systems eventually.
If I/you/we are worried about performance, we can talk to the ZFS developers -- maybe they will add an efficient path just for btop :-)

@simplepad
Copy link
Contributor Author

@trentbuck I see, we should be good then.

@dm17
Copy link

dm17 commented Mar 26, 2024

I'm wondering if ZFS users would like to see ZFS errors in btop, or if that is useless as they monitor those another way... I figure this would be a decent place to ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[REQUEST] zpool stats monitoring method change. show ZFS pools instead of dataset in the disk usage listing
4 participants