-
Notifications
You must be signed in to change notification settings - Fork 1.9k
2.3.3 staging prep #17371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.3.3 staging prep #17371
Conversation
Previous implementation of zap_leaf_array_free() put chunks on the free list in reverse order. Also zap_leaf_transfer_entry() and zap_entry_remove() were freeing name and value arrays in reverse order. Together this created a mess in the free list, making following allocations much more fragmented than necessary. This patch re-implements zap_leaf_array_free() to keep existing chunks order, and implements non-destructive zap_leaf_array_copy() to be used in zap_leaf_transfer_entry() to allow properly ordered freeing name and value arrays there and in zap_entry_remove(). With this change test of some writes and deletes shows percent of non-contiguous chunks in DDT reducing from 61% and 47% to 0% and 17% for arrays and frees respectively. Sure some explicit sorting could do even better, especially for ZAPs with variable-size arrays, but it would also cost much more, while this should be very cheap. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes openzfs#16766 (cherry picked from commit 9a81484)
Since embedded blocks introduction 11 years ago, their writing was blocked if dedup is enabled. After searching through the modern code I see no reason for this restriction to exist. Same time embedded blocks are dramatically cheaper. Even regular write of so small blocks would likely be cheaper than deduplication, even if the last is successful, not mentioning otherwise. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes openzfs#17113 (cherry picked from commit 09f4dd0)
It seems `fio` in `ddt_dedup_vdev_limit` overwhelms the system with the amount of dirty data caused by DDT updates within one TXG due to tiny 1KB records used, while I see no reason for this test to extend the TXGs beyond default. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: @ImAwsumm Reviewed-by: Tony Hutter <hutter2@llnl.gov> (cherry picked from commit 21850f5)
dbuf_prefetch_impl() should look on level of current indirect, not the target prefetch level. dbuf_prefetch_indirect_done() should call dnode_level_is_l2cacheable() if we have dpa_dnode to pass it. It should fix some both false positive and negative L2ARC caching. While there, fix redacted feature activation assertions. One was always true, while another could give false positive if dpa_dnode is NULL. George Amanakis <gamanakis@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes openzfs#17204 (cherry picked from commit a497c5f)
…nzfs#17228) Various tools will display draid vdev names with parameters embedded in them, but would not accept them as valid vdev names when looking them up, making it difficult to build pipelines involving draid vdevs. This commit makes it so that if a full draid name is offered for match, it gets truncated at the first ':' character. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> (cherry picked from commit 131df3b)
- Fix VERIFY3B() when given non-boolean values. - Map EQUIV() into VERIFY3B(,==,) as equivalent. - Tune messages for better readability and to closer match source code for easier search. Unify user-space messages with kernel. - Tune printed types and remove %px outside of Linux kernel. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Reviewed-by: @ImAwsumm Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> (cherry picked from commit 4866c2f)
The test writes 1M of 1KB blocks, which may produce up to 1GB of dirty data. On top of that ashift=12 likely produces additional 4GB of ZIO buffers during sync process. On top of that we likely need some page cache since the pool reside on files. And finally we need to cache the DDT. Not surprising that the test regularly ends up in OOMs, possibly depending on TXG size variations. Also replace fio with pretty strange parameter set with a set of dd writes and TXG commits, just as we neeed here. While here, remove compression. It has nothing to do here, but waste CI CPU time. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Reviewed-by: Paul Dagnelie <pcd@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> (cherry picked from commit 1d8f625)
did robn@07d815f make it in? |
@owlshrimp yes, that's #17353. |
- Kill workload first for faster cleanup. - Use `zpool wait` for resilver instead of `sleep`. - Remove irrelevant workload from `online_offline_003_neg`. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes: openzfs#17259 (cherry picked from commit cb49e77)
Replace `sleep 15` with `zpool wait`, which should take much less than the 15 seconds. And considering it is called 16 times, this should save us up to 4 minutes total. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes: openzfs#17257 (cherry picked from commit 8f2c2de)
All defined variants: - freebsd13-4r, freebsd13-5r, freebsd14-1r, freebsd14-2r (RELEASE) - freebsd13-5s, freebsd14-2s (STABLE) - freebsd15-0c (CURRENT) Used for testing: - freebsd13-4r (RELEASE) - freebsd14-2s (STABLE) - freebsd15-0c (CURRENT) Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes: openzfs#17260 (cherry picked from commit 6afb405)
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes: openzfs#17267 (cherry picked from commit 38c3a8b)
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed by: Attila Fülöp <attila@fueloep.org> Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org> Fixes: openzfs#14538 Closes: openzfs#17234 (cherry picked from commit 37a3e26)
Originally the Lustre ZFS OSD code was going to use zfs_uio_t structs for supporting Direct I/O with ZFS. However, this has changed to using abd_t structs instead. This exports the proper symbols that will be used by the Lustre ZFS OSD code. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Atkinson <batkinson@lanl.gov> Closes openzfs#17256 (cherry picked from commit 7031a48)
This commit adds support for using llvm-libunwind for kernels built using llvm and clang. The two differences are that the largest register index is given by _LIBUNWIND_HIGHEST_DWARF_REGISTER, we need to check whether the register is a floating point register and the prototype for unw_regname takes the unwind cursor as the first argument. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Sebastian Pauka <me@spauka.se> Closes openzfs#17230 (cherry picked from commit 1b4826b)
Those tests are write-mostly at the nested pool. Considering we have 3 more layers of caching underneath, we can hint ZFS how to use the memory better by setting primarycache=metadata. While there, add missing zpool sync after rm in checkpoint_capacity before we could potentially see the freed space, would not there be a pool checkpoint. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Tony Hutter <hutter2@llnl.gov> (cherry picked from commit 1ef706c)
Sometimes it fails unable to see any injected write errors. I guess writing 25KB of zeroes might be not enough to trigger errors with probability set to 10%. Lets try to write more. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes openzfs#17270 (cherry picked from commit d947b9a)
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes openzfs#17278 (cherry picked from commit 88ec6c4)
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Alexander Ziaee <ziaee@FreeBSD.org> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Quentin Thébault <quentin.thebault@defenso.fr> Closes openzfs#17282 (cherry picked from commit 63de2d2)
Don't use KSM on the FreeBSD VMs and optimize KSM settings for Linux to have faster run times. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes openzfs#17247 (cherry picked from commit ba17ced)
### Background Various admin operations will be invoked by some userspace task, but the work will be done on a separate kernel thread at a later time. Snapshots are an example, which are triggered through zfs_ioc_snapshot() -> dsl_dataset_snapshot(), but the actual work is from a task dispatched to dp_sync_taskq. Many such tasks end up in dsl_enforce_ds_ss_limits(), where various limits and permissions are enforced. Among other things, it is necessary to ensure that the invoking task (that is, the user) has permission to do things. We can't simply check if the running task has permission; it is a privileged kernel thread, which can do anything. However, in the general case it's not safe to simply query the task for its permissions at the check time, as the task may not exist any more, or its permissions may have changed since it was first invoked. So instead, we capture the permissions by saving CRED() in the user task, and then using it for the check through the secpolicy_* functions. ### Current implementation The current code calls CRED() to get the credential, which gets a pointer to the cred_t inside the current task and passes it to the worker task. However, it doesn't take a reference to the cred_t, and so expects that it won't change, and that the task continues to exist. In practice that is always the case, because we don't let the calling task return from the kernel until the work is done. For Linux, we also take a reference to the current task, because the Linux credential APIs for the most part do not check an arbitrary credential, but rather, query what a task can do. See secpolicy_zfs_proc(). Again, we don't take a reference on the task, just a pointer to it. ### Changes We change to calling crhold() on the task credential, and crfree() when we're done with it. This ensures it stays alive and unchanged for the duration of the call. On the Linux side, we change the main policy checking function priv_policy_ns() to use override_creds()/revert_creds() if necessary to make the provided credential active in the current task, allowing the standard task-permission APIs to do the needed check. Since the task pointer is no longer required, this lets us entirely remove secpolicy_zfs_proc() and the need to carry a task pointer around as well. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Pavel Snajdr <snajpa@snajpa.net> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Kyle Evans <kevans@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> (cherry picked from commit c8fa39b)
This change goes through and quotes variables where appropriate to avoid issues with incorrect splitting. The performance tests ran into an issue with $SUDO_COMMAND splitting incorrectly because it was not quoted. This change fixes that issue and hopefully gets ahead of any other similar problems. Reviewed by: John Wren Kennedy <jwk404@gmail.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Aleksandr Liber <aleksandr.liber@perforce.com> Closes openzfs#17235 (cherry picked from commit aa46cc9)
When multiple snapshots prevent the destruction/rollback of the respective dataset/snapshot/volume via zfs destroy or zfs rollback, the error message does not list the blocking snapshots sorted according to their order of creation. This causes inconvenience and can lead to confusion, and also creates a contrast with a returned message from zfs list -t snap function. Closes: openzfs#12751 Signed-off-by: Artem-OSSRevival <artem.vlasenko@ossrevival.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> (cherry picked from commit 27f3d94)
Decrease the RESILVER_MIN_TIME_MS variable from 50 to 20. So the test, which expects two 2 resilver starts will see them. Logfile of the seen failures before this fix: log: NOTE: expected 2 resilver start(s) after offline/online, found 1 log: expected 2 resilver start(s) after offline/online, found 1 The test time decreases also from around 00:42 to 00:24 seconds. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes openzfs#16822 Closes openzfs#17279 (cherry picked from commit 3b18877)
It's possible for two spares to get attached to a single failed vdev. This happens when you have a failed disk that is spared, and then you replace the failed disk with a new disk, but during the resilver the new disk fails, and ZED kicks in a spare for the failed new disk. This commit checks for that condition and disallows it. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes: openzfs#16547 Closes: openzfs#17231 (cherry picked from commit f40ab9e)
zpool_status_003 and zpool_status_004_pos use 'dd' to trigger a read of a file without specifying 'of=/dev/null'. This spams the ZTS logs with ~20MB of garbage data. This commit adds 'of=/dev/null'. Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Alexander Motin <mav@FreeBSD.org> (cherry picked from commit a6cca8a)
`S_IFMT` is declared in `sys/stat.h`, but we cannot include this header because it redeclares the `statx` function with different argument types. Therefore, we define `S_IFMT` ourselves, in the same way as the other definitions. Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: José Luis Salvador Rufo <salvador.joseluis@gmail.com> Closes openzfs#17293 Closes openzfs#17294 (cherry picked from commit 634c172)
When set, dumps all ZFS ioctl calls and returns and their nvlists to STDERR, to make debugging and understanding a lot easier. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes openzfs#17344 (cherry picked from commit a387b75)
Add support to alias md-type devices in udev rules. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andres <a-d-j-i@users.noreply.github.com> Closes openzfs#17345 (cherry picked from commit a6f2025)
We don't really need to access space map to know where the metaslab ends, while msp->ms_sm might be NULL. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Fixes openzfs#17164 Fixes openzfs#17359 Closes openzfs#17361 (cherry picked from commit 5c30b24)
Signed-off-by: Rob Norris <robn@despairlabs.com>
Also, this ARC performance PR (which needs |
@satmandu ZFS 2.4 release cycle should start in ~3 months. Leave something to it. ;) |
@amotin I assume that ZFS 2.3. x will make it into an Ubuntu LTS release next year since it is already in Debian, but we may get lucky if Debian picks up ZFS 2.4.0 and it trickles into Ubuntu late this year. I know...I know... it is ready when it is ready etc. |
May I ask for #14876 to get included, or is it better to wait until 2.4 for it? |
It is a user-visible functionality change. Lets leave those to major branches and leave minors for bug fixes and non-breaking optimizations. |
I sincerely hope 2.3.3 makes it in time for Debian |
Merged as: |
FYI, I never got back to backporting #17123, and not sure if I will or not soon. It's definitely important in the situations it applies to, but those are relatively niche. |
Ubuntu tends not to upgrade ZFS atter their LTS releases. I'd actually rather see 2.4 released too late for Ubuntu 26.04, so they get a mature 2.3. |
Would it be feasible to sneak a backport of #17394 in before the release is cut? |
Yup, I can pull that in. |
Why was this closed? Is there another pull request to watch? |
The merge/commits were completed outside of this PR. See this branch https://github.com/openzfs/zfs/tree/zfs-2.3.3-staging |
I see, but what does 'completed' mean in this context given that there wasn't a 2.3.3 release in the end? |
Done in the sense this is the first batch of proposed changes intended for 2.3.3. They've been merged to the zfs-2.3.3-staging branch to help keep track of them. I've opened #17459 with a second batch of changes for review. Once everything looks good we'll open a PR against the zfs-2.3-release branch with the full patch set which will become 2.3.3. |
Motivation and Context
Hopefully not treading on any toes, but I figured we're gonna want a new release soon with the dataset hold fixes, so I put this branch together with every nice thing on master lately.
Description
Includes:
txg_wait_synced_sig()
totxg_wait_synced_flags()
#17284S_IFMT
undeclared atstatx.c
#17294UIO_ITER
#17298libzfs_core
: addZFS_IOC_TRACE
envvar to enable ioctl tracing #17344Probably more than strictly necessary, a lot of it is tests, but it makes it easier to add more next time.
How Has This Been Tested?
ZTS passed on FreeBSD 14.2-p1, currently running on Linux 6.1.137
Types of changes
Checklist:
Signed-off-by
.