-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etc/systemd/zfs-mount-generator: rewrite in C #11917
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
b80c6c0
to
1791d06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool. But before moving forward with this any more I'd like to get @aerusso, @rlaager, and @InsanePrawn's thoughts on this since they're most familiar with the generator. In principle I'm not against rewriting it in C, but we should make sure the performance wins are worth the added complexity. Do you know where the main bottleneck was in the shell script, could we perhaps have gotten close to the same performance with more modest changes to the script instead?
Dunno, and I don't really care – it's a shell script that does five hundred lines of string manipulation for every single dataset and also is broken. I specifically wrote it to generate identical units (save for whitespace) in the first commit to be trivial to verify. You could, I guess, properly sort all the cache files instead to fix it. This finishes in the equivalent time of spawning a moderately-dynamically-linked executable (such as the new resulting C program, or literally anything you run from the shell, like zpool, or (to a lesser degree) sort) like twice or twice-and-a-half, and actually does more work than it sleeps (as evidenced by the 185-223% usage from the zsh time builtin, as compared to the 101%ish from the current shell implementation over literally more than twenty times more time). |
89dc910
to
c6ecc85
Compare
Well, the shell implementation is what I like to call 'historically grown' :) It might be worth revisiting what the generator outputs and whether we can handle some situations better or at all now, e.g. the noauto_files thing was obviously not the cleanest solution to the problem. [I think we could strictly speaking solve the multiple pools bug in the shell version by |
Yeah, the shell script had really gotten uncomfortable to maintain last year. That said, given how infrequently this is run, I don't think performance improvement (while quite impressive) is really all that important. The improvement in readability and stronger language constructs available, though, make this very welcome. As an aside, the real performance drain on the system is the I doubt I'll be able to give a full, proper review, but I'll mention the one thing I noticed poking around a tiny bit: |
@InsanePrawn I don't actually mind the noauto_files thing, and haven't come up with a better solution that does what it does, but that doesn't say much. And yeah, effectively catting it is what I meant by sorting all cache files, but @aerusso This is run on every daemon reload, including the multiple that happen during boot, it's incredibly important. To my use-cases, at least. zfs-list-cacher doesn't act on snapshots (at least, that's how it's written, and, testing it, that condition's correct): # If we are acting on a snapshot, we have nothing to do
printf '%s' "${ZEVENT_HISTORY_DSNAME}" | grep '@' && exit 0 it also does a lot of other filtering to avoid spurious updates. Beside being written in a generally less-than-super-performant style and overly-wide locking, which is easy enough to fix separately, there isn't much wrong there. You will also note that it does only touch the cache file for the affected pool (and anything less is impossible (or, at the very least, slower than just doing that)). And you'll be happy to hear it no longer serially starts every zedlet with a second at 100% CPU in the pre-exec now (though it seems that that's only in 2.1.0-rc3 and on HEAD, so). A pool name must begin with a letter and mountpoints are (verified to be) absolute paths. I don't see the problem, and don't intend to ship dead code. And I definitely don't intend to spawn processes to replace fifty lines of code that will never change again. |
435897e
to
afb93d8
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
848cf08
to
ba4ac31
Compare
I'd like to go ahead and move forward with integrating this change in the next week or two. If folks can make time to test it in their environments that would helpful to ensure we don't accidentally introduce any regressions. |
b296a59
to
f91081f
Compare
3118aac
to
2feafd0
Compare
Bump! It's been two weeks and no one has raised an issue in this comment period, and I'd like to move forward with another round of optimisations and http[s]:// handling I've staged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Change ABORT_ENOMEM to EXIT_ENOMEM. "abort" has a specific meaning in Unix C. The macro calls
_exit()
notabort()
. I agree with it calling_exit()
, so let's change the name to match. - Change the comments from
//
to/* ... */
. From a quick grep, literally 98-99% (depending on the pattern used) of the comments in the ZFS tree are/* ... */
form. That should be 100%, as ZFS follows the SunOS coding style which only uses/* ... */
comments. - The
$1
,$2
, etc. for the various fields seems like a vestige of the shell history of this generator.$1
is shell, not C. Nowhere in the C code are they referred to using offsets. The history cacher script does not refer to them with offsets, so it's not to match that. I see no reason for these to exist.
A plain rewrite of the shell version, and generates identical units, save for replacing some empty lines with nothing, having fewer meaningless spaces in After=s and different spacing in the lock scripts, for a clean git diff -w This is a gain of anywhere from 0m0.336s vs 0m0.022s (15.27x) to 0m0.202s vs 0m0.006s (33.67x), depending on the hardware, a.k.a. from "absolutely unusable" to "perfectly fine" This also properly deals with canmount=noauto units across multiple pools See PR for detailed timings (of an early version) and diffs Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Ref: openzfs#11915
git-diff--w-dirty, but: * zfs-load-key-$DSET.service -> zfs-load-key@$DSET.service * flattened set -eu into other /bin/sh flags * simpler (for 1 2 3 vs while [ counter ]; counter+=1) prompt loop * exec $ZFS where applicable Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
sure, whatever. updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for addressing the remaining feedback. I'll get this merged when the CI is done with it. |
git-diff--w-dirty, but: * zfs-load-key-$DSET.service -> zfs-load-key@$DSET.service * flattened set -eu into other /bin/sh flags * simpler (for 1 2 3 vs while [ counter ]; counter+=1) prompt loop * exec $ZFS where applicable Reviewed-by: Antonio Russo <aerusso@aerusso.net> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: InsanePrawn <insane.prawny@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Issue #11915 Closes #11917
A plain rewrite of the shell version, and generates identical units, save for replacing some empty lines with nothing, having fewer meaningless spaces in After=s and different spacing in the lock scripts, for a clean git diff -w This is a gain of anywhere from 0m0.336s vs 0m0.022s (15.27x) to 0m0.202s vs 0m0.006s (33.67x), depending on the hardware, a.k.a. from "absolutely unusable" to "perfectly fine" This also properly deals with canmount=noauto units across multiple pools See PR for detailed timings (of an early version) and diffs Reviewed-by: Antonio Russo <aerusso@aerusso.net> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: InsanePrawn <insane.prawny@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Issue openzfs#11915 Closes openzfs#11917
A plain rewrite of the shell version, and generates identical units, save for replacing some empty lines with nothing, having fewer meaningless spaces in After=s and different spacing in the lock scripts, for a clean git diff -w This is a gain of anywhere from 0m0.336s vs 0m0.022s (15.27x) to 0m0.202s vs 0m0.006s (33.67x), depending on the hardware, a.k.a. from "absolutely unusable" to "perfectly fine" This also properly deals with canmount=noauto units across multiple pools See PR for detailed timings (of an early version) and diffs Reviewed-by: Antonio Russo <aerusso@aerusso.net> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: InsanePrawn <insane.prawny@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Issue openzfs#11915 Closes openzfs#11917
git-diff--w-dirty, but: * zfs-load-key-$DSET.service -> zfs-load-key@$DSET.service * flattened set -eu into other /bin/sh flags * simpler (for 1 2 3 vs while [ counter ]; counter+=1) prompt loop * exec $ZFS where applicable Reviewed-by: Antonio Russo <aerusso@aerusso.net> Reviewed-by: Richard Laager <rlaager@wiktel.com> Reviewed-by: InsanePrawn <insane.prawny@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Issue openzfs#11915 Closes openzfs#11917
Motivation and Context
A 300ms generator is, uh, yeah, as previously mentioned in #11915, so I spent a great 14 hours on Saturday and a few more yesterday writing this.
Description
A plain rewrite of the shell version; the first commit generates identical units, save for replacing some empty lines with nothing, having fewer meaningless spaces in After=s and being slightly more verbose with spaces in the unlock scripts. The second cleans it up slightly.
This also properly deals with canmount=noauto units across multiple pools.
The most exciting results come from the two ZFS-on-root systems because they have an encrypted root dataset:
What follow are testing results from my incredibly sophisticated testing strategy (also here: zmg.tgz):
In short, real time as seen by time builtin:
Core 2 Duo E6400 (1x2x1, 2.13GHz) (2-channel DDR2-2133, 2.5/3.8G avail), 1 pool, sid x32 (or "my router")
i5-10210Y (1x4x2, 1GHz nominal, 4GHz turbo) (2-channel LPDDR3-2133, 13.8/15.5G avail), 1 pool, sid amd64 (or "my laptop")
2 x Xeon E5645 (2x6x2, 2.4GHz nominal, 2.67GHz turbo) (both 3-channel DDR3-1333, 14.6/94.4G avail), 1 pool, buster amd64 (or "a long compute from my attic")
Xeon W-10885M (1x8x2, 2.4GHz nominal, 5.3GHz turbo) (2-channel DDR4-2933, 78.1/125.4G avail), 1 pool, sid amd64 (or "my mate's P15")
How Has This Been Tested?
Ran it more times than I can think.
Types of changes
Checklist:
Signed-off-by
.