-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Support idmapped mount in user namespace #14097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3288473 to
ff75ffb
Compare
|
/cc @brauner in case you want to take a look at this? |
c895f02 to
866c144
Compare
brauner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just as an FYI. Kernel v5.17 has most of the infrastructure do correctly deal with idmapped mounts of idmapped filesystems. But the kernel which has a few further corner cases massaged out is v5.19. So I would advocate for v5.19 or v6.0 as the kernel with first class idmapped mounts of idmapped filesystems support.
module/os/linux/spl/spl-cred.c
Outdated
| #if defined(CONFIG_USER_NS) | ||
| return (&init_user_ns); | ||
| #else | ||
| return (NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you always return init_user_ns? That should always be available even without CONFIG_USER_NS and not having NULL returned from these functions makes thing safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. We support kernel as old as 3.10, I checked 3.10, init_user_ns is available.
include/os/linux/spl/sys/cred.h
Outdated
| #ifdef HAVE_SUPER_USER_NS | ||
| return (inode->i_sb->s_user_ns); | ||
| #else | ||
| return (NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you always return init_user_ns? That should always be available even without CONFIG_USER_NS and not having NULL returned from these functions makes thing safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some old kernel (e.g. unpatched 3.10) does not have super_block->s_user_ns, but returning init_user_ns in that case makes sense.
As the inline zfs_i_user_ns() will be used in zfs module (CDDL licensed) and the symbol "init_user_ns" is GPL, for this zfs_i_user_ns() to return &init_user_ns, we have to use a function zfs_get_init_userns() to get its pointer. My thought was to avoid function call if we can as it would be no fun to do tons of function calls when a busy server needs to handle lots of iops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the perf impact is measurable. But just chatted with @sforshee and he pointed out that you could probably just stash the address in a local pointer at module init. The address won't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Will work on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we've done elsewhere is to grab the init_user_ns from init_task.cred.user_ns which we do have access to. That's worked out reasonably well, I'd recommend that approach rather than introducing a new helpful function. You can use kcred->user_ns.
include/os/linux/spl/sys/cred.h
Outdated
| if (mnt_ns) | ||
| return (__kuid_val(make_kuid(mnt_ns, uid))); | ||
| return (uid); | ||
| return (!mnt_userns || !fs_userns || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to code it in such a manner that these functions can never receive a NULL pointer. See my other comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of agree but here is my take on having NULL - on older kernel version(< 5.12) which does not provide "mnt_userns" in the inode_operations callbacks, a NULL mnt_userns would provide a shortcut for no id mapping. If we try to use &init_user_ns in such case, a function call is needed (due to its GPL nature) and we have to check if it is the initial user ns in order to have a shortcut. As you can see, this would need more CPU cycles.
@behlendorf What is your thought on this?
| } | ||
|
|
||
| static inline gid_t zfs_gid_into_mnt(struct user_namespace *mnt_ns, gid_t gid) | ||
| static inline uid_t zfs_uid_to_vfsuid(struct user_namespace *mnt_userns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand, you're deliberately operating on plain {g,u}id_t types instead of k{g,u}id_t and vfs{g,u}id_t types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I see no need to massage around different types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think security wise this isn't a good idea which is why we have raw {g,u}id_t for raw on-disk or userspace values, {k,g}uid_t for filesystem- or kernel-wide values, and vfs{g,u}id_t for vfs/vfsmount specific values so they can never be confused and used for one another. Just a note, not a request to change things.
| #if defined(HAVE_XATTR_SET_USERNS) | ||
| if (!zpl_inode_owner_or_capable(mnt_ns, ip)) | ||
| return (-EPERM); | ||
| #else | ||
| (void) mnt_ns; | ||
| if (!zpl_inode_owner_or_capable(kcred->user_ns, ip)) | ||
| return (-EPERM); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The permission checking here confuses me a bit. Afaict, zpl_inode_owner_or_capable is just an ifdef for inode_owner_or_capable. The inode_capable_or_owner() helper should always receive the current_user_ns() and the mnt_userns. If idmapped mounts aren't supported then init_user_ns should be passed for mnt_userns. The mnt_userns is never used for capability checks currently so if zpl_inode_owner_or_capable checks any capabilities in the mnt_userns when passed mnt_userns as the first argument then the permission checking is wrong.
The current_user_ns() is used to check for stuff like CAP_FSETID and the mnt_userns only matters for remapping the ownership; capabilities aren't checked in there yet. We might have use-cases for this in the future but we don't right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. zpl_inode_owner_or_capable() is an wrapper on inode_owner_or_capable().
I searched latest kernel code, there are a bunch of places where mnt_userns is passed into inode_owner_or_capable(). Are they all wrong? for example, this one: https://github.com/torvalds/linux/blob/master/fs/posix_acl.c#L1142
Without this change, doing "cp -p" of an idmapped file in a user namespace would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misremembered how I wrote inode_owner_or_capable(). It's correct the way you've done it. Fwiw, in the future I will pass down struct mnt_idmap instead of the plain namespace for additional type safety. The mnt_userns won't be visible outside the really low-level code and won't be exposed to the vfs and especially not to individual filesystems eliminating all confusions.
| #if defined(HAVE_XATTR_SET_USERNS) | ||
| if (!zpl_inode_owner_or_capable(mnt_ns, ip)) | ||
| return (-EPERM); | ||
| #else | ||
| (void) mnt_ns; | ||
| if (!zpl_inode_owner_or_capable(kcred->user_ns, ip)) | ||
| return (-EPERM); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
| const char *name, const void *buffer, size_t size, int flags) \ | ||
| { \ | ||
| return (__ ## fn(dentry->d_inode, name, buffer, size, flags)); \ | ||
| return (__ ## fn(NULL, dentry->d_inode, name, buffer, size, flags));\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really not do this NULL pointer passing. I've carefully coded this upstream so that idmapped mounts always pass the init_user_ns around as an indicator for non-idmapped mounts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These cases the xattr_handler callback does not provide user ns argument so a NULL is passed to the underlying "__***" function to match the args list.
I will think more about this NULL stuff.
|
|
||
| static inline struct user_namespace *zfs_i_user_ns(struct inode *inode) | ||
| { | ||
| #ifdef HAVE_SUPER_USER_NS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you support kernels without sb->s_user_ns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we do.
866c144 to
ce15796
Compare
|
@brauner - Thank you very much for your review comments. I think I've addressed all of them, would you please take a look at the new changes? Here is a summary of them:
I've run a full zfs test suite, and xfstests idmapped test cases. All look good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty sane to me. I haven't tested it of course.
One thing: If zfs supports functioning as a lower or upper filesystem for overlayfs then you want to test that with xfstests too. This can be done via:
sudo ./check -overlay -g quick
and if you want to test zfs as lower/upper with overlayfs on top of idmapped mounts you will need:
export IDMAPPED_MOUNTS=true
in local.config and then
sudo ./check -overlay -g quick
As it turns out support for overlayfs was merged just last month, so it's be good to run these tests. Thanks for pointing them out! |
|
Thanks for the xfstests info about overlay. I played a bit, xfs only, nothing zfs, kernel 5.19.0-17-generic on Ubuntu: Failed 13 of 648 tests The local.config: Making xfstests work for zfs file system seems to be non-trivial, someone needs to spend quite some time digging into the test framework and making tweaks/changes. |
Note that some of these failures are expected. This specifically includes |
Failures of |
IOW, most of these failures are fixed on |
Good to know, thanks. I am happy with the result of those idmapped mount test cases. I will come back to overlay + idmapped mount testing when time permits. |
|
Able to reproduce the build error on a Ubuntu 18.04 VM in AWS (kernel 5.4.0-1084-aws), with the added "extern struct task_struct init_task" in cred.h, the build passed. |
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
bee3093 to
1dd719c
Compare
Indentation style check cannot mess with #ifdef Signed-off-by: Youzhong Yang <yyang@mathworks.com>
1dd719c to
640f11e
Compare
Yep, I sent patches for supporting overlayfs on top of idmapped mounts starting with |
|
The zvol_misc_trim failure on Ubuntu was reproducible without this PR, as of commit da3d266. |
|
Something fell apart on Fedora 35 running kernel 6.0.5-100.fc35.x86_64, 4 out of 5 idmap_mount test cases failed. I was able to reproduce on a similar AWS VM. Investigation underway. |
What are the errors you're seeing on Fedora? |
|
@youzhongyang we recently started see those same failures on Fedora 35 on the master branch without this PR. |
Hi @brauner, I created a script to reproduce the issue: Here is its output when running on Fedora 35 kernel 6.0.5: It looks like a regression to me. How could it work on kernels all the way up to 5.19, and then fail on 6.0.5? |
|
Running bpftrace showed the &init_user_ns is passed into zpl_getattr() call, which looks correct. Not sure why the 'stat' program gets the wrong uid and gid back: |
Heh, @behlendorf and @youzhongyang that's not a regression you just haven't adapted to the Basically, prior to these changes we placed for the required filesystem conversion. I suspect you need at least sm like: diff --git a/module/os/linux/zfs/zpl_inode.c b/module/os/linux/zfs/zpl_inode.c
index 64016f9ac..52f10977e 100644
--- a/module/os/linux/zfs/zpl_inode.c
+++ b/module/os/linux/zfs/zpl_inode.c
@@ -468,8 +468,12 @@ zpl_setattr(struct dentry *dentry, struct iattr *ia)
vap = kmem_zalloc(sizeof (vattr_t), KM_SLEEP);
vap->va_mask = ia->ia_valid & ATTR_IATTR_MASK;
vap->va_mode = ia->ia_mode;
- vap->va_uid = KUID_TO_SUID(ia->ia_uid);
- vap->va_gid = KGID_TO_SGID(ia->ia_gid);
+ if (ia->ia_valid & ATTR_UID)
+ vap->va_uid = KUID_TO_SUID(
+ from_vfsuid(user_ns, i_user_ns(inode), ia->ia_vfsuid));
+ if (ia->ia_valid & ATTR_GID)
+ vap->va_gid = KGID_TO_SGID(
+ from_vfsgid(user_ns, i_user_ns(inode), ia->ia_vfsgid));
vap->va_size = ia->ia_size;
vap->va_atime = ia->ia_atime;
vap->va_mtime = ia->ia_mtime; |
|
@brauner Thanks for the clarification. The unions in 'struct iattr' are done in this commit, which is available in kernel 6+, this somehow explains why it works in 5.19, but not in 6.0. I think we need another detector for this data structure change, and modify the zfs code so that it works either way. |
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes openzfs#14097
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes openzfs#14097
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes openzfs#14097
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes openzfs#14097
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes openzfs#14097
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes openzfs#14097
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes openzfs#14097
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes openzfs#14097
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes openzfs#14097
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes openzfs#14097
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". Update the OpenZFS accordingly to improve the idmapped mount support. This pull request contains the following changes: - xattr setter functions are fixed to take mnt_ns argument. Without this, cp -p would fail for an idmapped mount in a user namespace. - idmap_util is enhanced/fixed for its use in a user ns context. - One test case added to test idmapped mount in a user ns. Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Closes openzfs#14097
commit 619a318a127722ade0dcf94a6bbd224f3aca54fc
Author: Jorgen Lundman <lundman@lundman.net>
Date: Sun Nov 20 16:28:03 2022 +0900
Adding sysv_abi to assembly prototypes
This is a test to see if Linux, and toolchains, would be
unhappy specifying sysv abi usage for the assembler functions,
they are written with sysv in mind after all.
Otherwise we can leave it as an empty MACRO on Linux.
Signed-off-by: Jorgen Lundman <lundman@lundman.net>
commit b0657a59abb38659721bf8d973920292c4f4a1a8
Author: John Wren Kennedy <john.kennedy@delphix.com>
Date: Fri Nov 18 12:43:18 2022 -0700
ZTS: zts-report silently ignores perf test results
The regex used to extract test result information from a test run only
matches the functional tests. Update the regex so it matches both.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #14185
commit 3a74f488fcd9b3802efa366adcb813415d3f13a8
Author: Ameer Hamza <106930537+ixhamza@users.noreply.github.com>
Date: Sat Nov 19 00:39:59 2022 +0500
zed: post a udev change event from spa_vdev_attach()
In order for zed to process the removal event correctly,
udev change event needs to be posted to sync the blkid
information. spa_create() and spa_config_update() posts
the event already through spa_write_cachefile(). Doing
the same for spa_vdev_attach() that handles the case
for vdev attachment and replacement.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14172
commit 3226e0dc8ef6f7770035c42b28f2b088bbdd2944
Author: George Amanakis <gamanakis@gmail.com>
Date: Fri Nov 18 20:38:37 2022 +0100
Fix setting the large_block feature after receiving a snapshot
We are not allowed to dirty a filesystem when done receiving
a snapshot. In this case the flag SPA_FEATURE_LARGE_BLOCKS will
not be set on that filesystem since the filesystem is not on
dp_dirty_datasets, and a subsequent encrypted raw send will fail.
Fix this by checking in dsl_dataset_snapshot_sync_impl() if the feature
needs to be activated and do so if appropriate.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes #13699
Closes #13782
commit 99c0479a4ef4cbfdf49ad05a4457d0872ab98f4c
Author: Laura Hild <hild.laura.s@gmail.com>
Date: Fri Nov 18 14:36:19 2022 -0500
Correct multipathd.target to .service
https://github.com/openzfs/zfs/pull/9863 says it "orders
zfs-import-cache.service and zfs-import-scan.service after
multipathd.service" but the commit (79add96) actually
ordered them after .target.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Laura Hild <lsh@jlab.org>
Closes #12709
Closes #14171
commit 0a0166c9755a423906c097a29702d4962c73cf77
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Thu Nov 3 13:53:17 2022 -0400
FreeBSD: do_mount() passes wrong string length to helper
It should pass `MNT_LINE_MAX`, but passes `sizeof (mntpt)`. This is
harmless because the strlen is not actually used by the helper, but
FreeBSD's Coverity scans complained about it.
This was missed in my audit of various string functions since it is not
actually passed to a string function.
Upon review, it was noticed that the helper function does not need to be
a separate function, so I have inlined it as cleanup.
Reported-by: Coverity (CID 1432079)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: szubersk <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14136
commit 31247c78b15aefeac5d395109209ca8a99ff5d60
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Thu Nov 3 13:58:38 2022 -0400
FreeBSD: get_zfs_ioctl_version() should be cast to (void)
FreeBSD's Coverity scans complain that we ignore the return value. There
is no need to check the return value so we cast it to (void) to suppress
further complaints by static analyzers.
Reported-by: Coverity (CID 1018175)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: szubersk <szuberskidamian@gmail.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14136
commit 9e7fc5da3806b971304d13d513ea1504c7fe98f6
Author: szubersk <szuberskidamian@gmail.com>
Date: Sat Nov 12 22:48:32 2022 +1000
Ubuntu 22.04 integration: GitHub workflows
- GitHub workflows are run on Ubuntu 22.04
- Extract the `checkstyle` workflow dependencies to a separate file.
- Refresh the `build-dependencies.txt` list.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
commit 32ef14de0f3609c35d2478dd52950e9ad65b8c6d
Author: szubersk <szuberskidamian@gmail.com>
Date: Sat Nov 12 22:30:57 2022 +1000
Ubuntu 22.04 integration: ZTS
Add `detect_odr_violation=1` to ASAN_OPTIONS to allow both libzfs
and libzpool expose
```
zfeature_info_t spa_feature_table[SPA_FEATURES]
```
from module/zcommon/zfeature_common.c in public ABI.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
commit 28ea4f9b088fd7fb33593f09d37bae44ea85e4fb
Author: szubersk <szuberskidamian@gmail.com>
Date: Sat Nov 12 22:29:29 2022 +1000
Ubuntu 22.04 integration: Cppcheck
Suppress a false positive found by new Cppcheck version.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
commit b46be903fb45a1ff463518d8e6b92f05723427cf
Author: szubersk <szuberskidamian@gmail.com>
Date: Sat Nov 12 22:23:30 2022 +1000
Ubuntu 22.04 integration: mancheck
Correct new mandoc errors.
```
STYLE: input text line longer than 80 bytes
STYLE: no blank before trailing delimiter
```
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
commit a5087965fe2fbb8cae60232b9b41b7ce7464daf1
Author: szubersk <szuberskidamian@gmail.com>
Date: Sat Nov 12 22:22:49 2022 +1000
Ubuntu 22.04 integration: ShellCheck
- Add new SC2312 global exclude.
```
Consider invoking this command separately to avoid masking its return
value (or use '|| true' to ignore). [SC2312]
```
- Correct errors detected by new ShellCheck version.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14148
commit c3b6fd3d594f27827d69d972b41520ef0646bdea
Author: Damian Szuberski <szuberskidamian@gmail.com>
Date: Thu Nov 17 03:27:53 2022 +1000
Make autodetection disable pyzfs for kernel/srpm configurations
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #13394
Closes #14178
commit 2163cde450d0898b5f7bac16afb4e238485411ff
Author: Rich Ercolani <214141+rincebrain@users.noreply.github.com>
Date: Tue Nov 15 17:44:12 2022 -0500
Handle and detect #13709's unlock regression (#14161)
In #13709, as in #11294 before it, it turns out that 63a26454 still had
the same failure mode as when it was first landed as d1d47691, and
fails to unlock certain datasets that formerly worked.
Rather than reverting it again, let's add handling to just throw out
the accounting metadata that failed to unlock when that happens, as
well as a test with a pre-broken pool image to ensure that we never get
bitten by this again.
Fixes: #13709
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
commit b445b25b273d263f032fadd717e5731185b74bf5
Author: shodanshok <g.danti@assyoma.it>
Date: Fri Nov 11 19:41:36 2022 +0100
Fix arc_p aggressive increase
The original ARC paper called for an initial 50/50 MRU/MFU split
and this is accounted in various places where arc_p = arc_c >> 1,
with further adjustment based on ghost lists size/hit. However, in
current code both arc_adapt() and arc_get_data_impl() aggressively
grow arc_p until arc_c is reached, causing unneeded pressure on
MFU and greatly reducing its scan-resistance until ghost list
adjustments kick in.
This patch restores the original behavior of initially having arc_p
as 1/2 of total ARC, without preventing MRU to use up to 100% total
ARC when MFU is empty.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #14137
Closes #14120
commit 9f4ede63d23be4f43ba8dd0ca42c6a773a8eaa8d
Author: Paul Dagnelie <paul.dagnelie@delphix.com>
Date: Thu Nov 10 15:23:46 2022 -0800
Add ability to recompress send streams with new compression algorithm
As new compression algorithms are added to ZFS, it could be useful for
people to recompress data with new algorithms. There is currently no
mechanism to do this aside from copying the data manually into a new
filesystem with the new algorithm enabled. This tool allows the
transformation to happen through zfs send, allowing it to be done
efficiently to remote systems and in an incremental fashion.
A new zstream command is added that decompresses WRITE records and
then recompresses them with a provided algorithm, and then re-emits
the modified send stream. It may also be possible to re-compress
embedded block pointers, but that was not attempted for the initial
version.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #14106
commit e9ab9e512c277ce3c22208599ebe5814db41a036
Author: John Wren Kennedy <john.kennedy@delphix.com>
Date: Thu Nov 10 15:00:04 2022 -0700
ZTS: random_readwrite test doesn't run correctly
This test uses fio's bssplit mechanism to choose io sizes for the test,
leaving the PERF_IOSIZES variable empty. Because that variable is
empty, the innermost loop in do_fio_run_impl is never executed, and as
a result, this test does the setup but collects no data. Setting the
variable to "bssplit" allows performance data to be gathered.
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: John Wren Kennedy <john.kennedy@delphix.com>
Closes #14163
commit b1eec00904a22bd6600a2853709ca50faa56ea24
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Thu Nov 10 09:09:35 2022 -0500
Cleanup: Suppress Coverity dereference before/after NULL check reports
f224eddf922a33ca4b86d83148e9e6fa155fc290 began dereferencing a NULL
checked pointer in zpl_vap_init(), which made Coverity complain because
either the dereference is unsafe or the NULL check is unnecessary. Upon
inspection, this pointer is guaranteed to never be NULL because it is
from the Linux kernel VFS. The calls into ZFS simply would not make
sense if this pointer were NULL, so the NULL check is unnecessary.
Reported-by: Coverity (CID 1527260)
Reported-by: Coverity (CID 1527262)
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14170
commit 9e2be2dfbde6c41ff53d71f3669cb6b9909c5a40
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Thu Nov 10 09:01:58 2022 -0500
Fix potential NULL pointer dereference regression
945b407486a0072ec7dd117a0bde2f72d52eb445 neglected to `NULL` check
`tx->tx_objset`, which is already done in the function. This upset
Coverity, which complained about a "dereference after null check".
Upon inspection, it was found that whenever `dmu_tx_create_dd()` is
called followed by `dmu_tx_assign()`, such as in
`dsl_sync_task_common()`, `tx->tx_objset` will be `NULL`.
Reported-by: Coverity (CID 1527261)
Reviewed-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Reviewed-by: Youzhong Yang <yyang@mathworks.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14170
commit 16f0fdadddcc7562ddf475f496a434b9ac94b0f7
Author: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Date: Thu Nov 10 22:37:12 2022 +0100
Allow to control failfast
Linux defaults to setting "failfast" on BIOs, so that the OS will not
retry IOs that fail, and instead report the error to ZFS.
In some cases, such as errors reported by the HBA driver, not
the device itself, we would wish to retry rather than generating
vdev errors in ZFS. This new property allows that.
This introduces a per vdev option to disable the failfast option.
This also introduces a global module parameter to define the failfast
mask value.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Seagate Technology LLC
Submitted-by: Klara, Inc.
Closes #14056
commit 945b407486a0072ec7dd117a0bde2f72d52eb445
Author: Mariusz Zaborski <oshogbo@vexillium.org>
Date: Tue Nov 8 21:40:22 2022 +0100
quota: disable quota check for ZVOL
The quota for ZVOLs is set to the size of the volume. When the quota
reaches the maximum, there isn't an excellent way to check if the new
writers are overwriting the data or if they are inserting a new one.
Because of that, when we reach the maximum quota, we wait till txg is
flushed. This is causing a significant fluctuation in bandwidth.
In the case of ZVOL, the quota is enforced by the volsize, so we
can omit it.
This commit adds a sysctl thats allow to control if the quota mechanism
should be enforced or not.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <mariusz.zaborski@klarasystems.com>
Sponsored-by: Zededa Inc.
Sponsored-by: Klara Inc.
Closes #13838
commit e197bb24f1857c823b44c2175b2318c472d79731
Author: Alan Somers <asomers@gmail.com>
Date: Tue Nov 8 13:38:08 2022 -0700
Optionally skip zil_close during zvol_create_minor_impl
If there were no zil entries to replay, skip zil_close. zil_close waits
for a transaction to sync. That can take several seconds, for example
during pool import of a resilvering pool. Skipping zil_close can cut
the time for "zpool import" from 2 hours to 45 seconds on a resilvering
pool with a thousand zvols.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Sponsored-by: Axcient
Closes #13999
Closes #14015
commit f224eddf922a33ca4b86d83148e9e6fa155fc290
Author: youzhongyang <youzhong@gmail.com>
Date: Tue Nov 8 13:28:56 2022 -0500
Support idmapped mount in user namespace
Linux 5.17 commit torvalds/linux@5dfbfe71e enables "the idmapping
infrastructure to support idmapped mounts of filesystems mounted
with an idmapping". Update the OpenZFS accordingly to improve the
idmapped mount support.
This pull request contains the following changes:
- xattr setter functions are fixed to take mnt_ns argument. Without
this, cp -p would fail for an idmapped mount in a user namespace.
- idmap_util is enhanced/fixed for its use in a user ns context.
- One test case added to test idmapped mount in a user ns.
Reviewed-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14097
commit 109731cd73c56c378b4c71732b9b9d3504a7a7e1
Author: Damian Szuberski <szuberskidamian@gmail.com>
Date: Wed Nov 9 04:16:01 2022 +1000
dsl_prop_known_index(): check for invalid prop
Resolve UBSAN array-index-out-of-bounds error in zprop_desc_t.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes #14142
Closes #14147
commit 41715771b5de07cbfcb1f7b75f324e824dfa1728
Author: Mohamed Tawfik <m_tawfik@aucegypt.edu>
Date: Tue Nov 8 20:08:21 2022 +0200
Adds the `-p` option to `zfs holds`
This allows for printing a machine-readable, accurate to the second,
hold creation time in the form of a unix epoch timestamp.
Additionally, updates relevant documentation and man pages accordingly.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mohamed Tawfik <m_tawfik@aucegypt.edu>
Closes #13690
Closes #14152
commit ecbf02791f921b39594719ea103ae66ed2fce095
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Fri Oct 28 00:55:45 2022 +0100
freebsd: simplify MD isa_defs.h
Most of this file was a pile of defines, apparently from Solaris that
controlled nothing in the source tree. A few things controlled the
definition of unused types or macros which I have removed.
Considerable further cleanup is possible including removal of
architectures FreeBSD never supported. This file should likely converge
with the Linux version to the extent possible.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
commit e3ba8eb12ef80a102a3f208a5a8d43eee3d21931
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Fri Oct 28 00:41:53 2022 +0100
freebsd: trim dkio.h to the minimum
Only DKIOCFLUSHWRITECACHE is required.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
commit 20b867f5f716fedab675f5eac395e7e1ea54572d
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Thu Oct 27 22:45:44 2022 +0100
freebsd: add ifdefs around legacy ioctl support
Require that ZFS_LEGACY_SUPPORT be defined for legacy ioctl support to
be built. For now, define it in zfs_ioctl_compat.h so support is always
built. This will allow systems that need never support pre-openzfs
tools a mechanism to remove support at build time. This code should
be removed once the need for tool compatability is gone.
No functional change at this time.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
commit 6c89cffc2cccbca82314bf276d31512f9dc4f6ec
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Thu Oct 27 22:28:55 2022 +0100
freebsd: remove no-op vn_renamepath()
vn_renamepath() is a Solaris-ism that was defined away in the FreeBSD
port. Now that the only use is in the FreeBSD zfs_vnops_os.c, drop it
entierly.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
commit 270b1b5fa75adc54d5af5794a885d05120f83640
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Thu Oct 27 22:24:42 2022 +0100
freebsd: remove unused vn_rename()
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14127
commit c23738c70eb86a7f04f93292caef2ed977047608
Author: Ameer Hamza <106930537+ixhamza@users.noreply.github.com>
Date: Fri Nov 4 23:33:47 2022 +0500
zed: Prevent special vdev to be replaced by hot spare
Special vdevs should not be replaced by a hot spare.
Log vdevs already support this, extending the
functionality for special vdevs.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes #14129
commit 73b8f700b68dc1c537781b2bee0f06c2b6d09418
Author: Alexander Lobakin <alobakin@pm.me>
Date: Sun Oct 16 23:41:39 2022 +0200
icp: fix all !ENDBR objtool warnings in x86 Asm code
Currently, only Blake3 x86 Asm code has signs of being ENDBR-aware.
At least, under certain conditions it includes some header file and
uses some custom macro from there.
Linux has its own NOENDBR since several releases ago. It's defined
in the same <asm/linkage.h>, so currently <sys/asm_linkage.h>
already is provided with it.
Let's unify those two into one %ENDBR macro. At first, check if it's
present already. If so -- use Linux kernel version. Otherwise, try
to go that second way and use %_CET_ENDBR from <cet.h> if available.
If no, fall back to just empty definition.
This fixes a couple more 'relocations to !ENDBR' across the module.
And now that we always have the latest/actual ENDBR definition, use
it at the entrance of the few corresponding functions that objtool
still complains about. This matches the way how it's used in the
upstream x86 core Asm code.
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035
commit 61cca6fa0506d41e5c794b293bedd982265fc1b2
Author: Alexander Lobakin <alobakin@pm.me>
Date: Sun Oct 16 23:23:44 2022 +0200
icp: fix rodata being marked as text in x86 Asm code
objtool properly complains that it can't decode some of the
instructions from ICP x86 Asm code. As mentioned in the Makefile,
where those object files were excluded from objtool check (but they
can still be visible under IBT and LTO), those are just constants,
not code.
In that case, they must be placed in .rodata, so they won't be
marked as "allocatable, executable" (ax) in EFL headers and this
effectively prevents objtool from trying to decode this data. That
reveals a whole bunch of other issues in ICP Asm code, as previously
objtool was bailing out after that warning message.
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035
commit b844489ec0e35b0a9b3cda5ba72bf29334f81081
Author: Alexander Lobakin <alobakin@pm.me>
Date: Sun Oct 16 16:53:22 2022 +0200
icp: properly fix all RETs in x86_64 Asm code
Commit 43569ee37420 ("Fix objtool: missing int3 after ret warning")
addressed replacing all `ret`s in x86 asm code to a macro in the
Linux kernel in order to enable SLS. That was done by copying the
upstream macro definitions and fixed objtool complaints.
Since then, several more mitigations were introduced, including
Rethunk. It requires to have a jump to one of the thunks in order
to work, so the RET macro was changed again. And, as ZFS code
didn't use the mainline defition, but copied it, this is currently
missing.
Objtool reminds about it time to time (Clang 16, CONFIG_RETHUNK=y):
fs/zfs/lua/zlua.o: warning: objtool: setjmp+0x25: 'naked' return
found in RETHUNK build
fs/zfs/lua/zlua.o: warning: objtool: longjmp+0x27: 'naked' return
found in RETHUNK build
Do it the following way:
* if we're building under Linux, unconditionally include
<linux/linkage.h> in the related files. It is available in x86
sources since even pre-2.6 times, so doesn't need any conftests;
* then, if RET macro is available, it will be used directly, so that
we will always have the version actual to the kernel we build;
* if there's no such macro, we define it as a simple `ret`, as it
was on pre-SLS times.
This ensures we always have the up-to-date definition with no need
to update it manually, and at the same time is safe for the whole
variety of kernels ZFS module supports.
Then, there's a couple more "naked" rets left in the code, they're
just defined as:
.byte 0xf3,0xc3
In fact, this is just:
rep ret
`rep ret` instead of just `ret` seems to mitigate performance issues
on some old AMD processors and most likely makes no sense as of
today.
Anyways, address those rets, so that they will be protected with
Rethunk and SLS. Include <sys/asm_linkage.h> here which now always
has RET definition and replace those constructs with just RET.
This wipes the last couple of places with unpatched rets objtool's
been complaining about.
Reviewed-by: Attila Fülöp <attila@fueloep.org>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Closes #14035
commit 993ee7a00670667f97d990aa5e38eb5cf5effc37
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Fri Nov 4 14:06:14 2022 -0400
FreeBSD: Fix out of bounds read in zfs_ioctl_ozfs_to_legacy()
There is an off by 1 error in the check. Fortunately, this function does
not appear to be used in kernel space, despite being compiled as part of
the kernel module. However, it is used in userspace. Callers of
lzc_ioctl_fd() likely will crash if they attempt to use the
unimplemented request number.
This was reported by FreeBSD's coverity scan.
Reported-by: Coverity (CID 1432059)
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14135
commit f66ffe68787f9675ad7cce7644a1f81f28a86939
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
Date: Thu Nov 3 15:02:46 2022 -0700
Expose zfs_vdev_open_timeout_ms as a tunable
Some of our customers have been occasionally hitting zfs import failures
in Linux because udevd doesn't create the by-id symbolic links in time
for zpool import to use them. The main issue is that the
systemd-udev-settle.service that zfs-import-cache.service and other
services depend on is racy. There is also an openzfs issue filed (see
https://github.com/openzfs/zfs/issues/10891) outlining the problem and
potential solutions.
With the proper solutions being significant in terms of complexity and
the priority of the issue being low for the time being, this patch
exposes `zfs_vdev_open_timeout_ms` as a tunable so people that are
experiencing this issue often can increase it as a workaround.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14133
commit 595d3ac2ed61331124feda2cf5787c3dd4c7ae09
Author: Allan Jude <allan@klarasystems.com>
Date: Thu Nov 3 14:53:24 2022 -0400
Allow mounting snapshots in .zfs/snapshot as a regular user
Rather than doing a terrible credential swapping hack, we just
check that the thing being mounted is a snapshot, and the mountpoint
is the zfsctl directory, then we allow it.
If the mount attempt is from inside a jail, on an unjailed dataset
(mounted from the host, not by the jail), the ability to mount the
snapshot is controlled by a new per-jail parameter: zfs.mount_snapshot
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Sponsored-by: Modirum MDPay
Sponsored-by: Klara Inc.
Closes #13758
commit 11e3416ae78d09380c523b703fad8dee145658d5
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Thu Nov 3 13:47:48 2022 -0400
Cleanup: Remove branches that always evaluate the same way
Coverity reported that the ASSERT in taskq_create() is always true and
the `*offp > MAXOFFSET_T` check in zfs_file_seek() is always false.
We delete them as cleanup.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14130
commit 1e1ce10e5579a530606060f095f2f139916621fe
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Tue Nov 1 20:45:36 2022 +0000
Remove an unused variable
Clang-16 detects this set-but-unused variable which is assigned and
incremented, but never referenced otherwise.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14125
commit abb42dc5e1d5073ac72d9294fa78ab2203406b1c
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Tue Nov 1 20:43:32 2022 +0000
Make 1-bit bitfields unsigned
This fixes -Wsingle-bit-bitfield-constant-conversion warning from
clang-16 like:
lib/libzfs/libzfs_dataset.c:4529:19: error: implicit truncation
from 'int' to a one-bit wide bit-field changes value from
1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
flags.nounmount = B_TRUE;
^ ~~~~~~
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14125
commit f47f6a055d0c282593fe701bcaa968225ba9d1fc
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Thu Nov 3 12:58:14 2022 -0400
Address warnings about possible division by zero from clangsa
* The complaint in ztest_replay_write() is only possible if something
went horribly wrong. An assertion will silence this and if it goes
off, we will know that something is wrong.
* The complaint in spa_estimate_metaslabs_to_flush() is not impossible,
but seems very unlikely. We resolve this by passing the value from
the `MIN()` that does not go to infinity when the variable is zero.
There was a third report from Clang's scan-build, but that was a
definite false positive and disappeared when checked again through
Clang's static analyzer with Z3 refution via CodeChecker.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14124
commit 27d29946be5e555d8659d6ebdeda6ae771ada5d6
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Thu Nov 3 09:57:05 2022 -0700
libuutil: deobfuscate internal pointers
uu_avl and uu_list stored internal next/prev pointers and parent
pointers (unused) obfuscated (byte swapped) to hide them from a long
forgotten leak checker (No one at the 2022 OpenZFS developers meeting
could recall the history.) This would break on CHERI systems and adds
no obvious value. Rename the members, use proper types rather than
uintptr_t, and eliminate the related macros.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14126
commit 211ec1b9fde303968d42e49553c666f74638d2ec
Author: Attila Fülöp <attila@fueloep.org>
Date: Thu Nov 3 17:55:13 2022 +0100
Deny receiving into encrypted datasets if the keys are not loaded
Commit 68ddc06b611854560fefa377437eb3c9480e084b introduced support
for receiving unencrypted datasets as children of encrypted ones but
unfortunately got the logic upside down. This resulted in failing to
deny receives of incremental sends into encrypted datasets without
their keys loaded. If receiving a filesystem, the receive was done
into a newly created unencrypted child dataset of the target. In
case of volumes the receive made the target volume undeletable since
a dataset was created below it, which we obviously can't handle.
Incremental streams with embedded blocks are affected as well.
We fix the broken logic to properly deny receives in such cases.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Attila Fülöp <attila@fueloep.org>
Closes #13598
Closes #14055
Closes #14119
commit 84477e148dccf4665067c0d39006f31bb073cc9e
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Thu Oct 27 23:39:06 2022 +0100
lua: cast through uintptr_t when return a pointer
Don't assume size_t can carry pointer provenance and use uintptr_t
(identialy on all current platforms) instead.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
commit b9041e1f27b7b29b27ac3b873c7ba2922bccca01
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Thu Oct 27 23:28:03 2022 +0100
Use intptr_t when storing an integer in a pointer
Cast the integer type to (u)intptr_t before casting to "void *". In
CHERI C/C++ we warn on bare casts from integers to pointers to catch
attempts to create pointers our of thin air. We allow the warning to be
supressed with a suitable cast through (u)intptr_t.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
commit 877790001e74b6c3b2955e4b7a8c685385e77654
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Thu Oct 27 23:25:42 2022 +0100
recvd_props_mode: use a uintptr_t to stash nvlists
Avoid assuming than a uint64_t can hold a pointer.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
commit 250b2bac78102f707dc105450f25d91e5fab481e
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Thu Oct 27 23:20:05 2022 +0100
zfs_onexit_add_cb: make action_handle point to a uintptr_t
Avoid assuming than a uint64_t can hold a pointer and reduce the
number of casts in the process.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
commit d96303cb0787bf7217aacd51074e00d820a98700
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Thu Oct 27 23:04:17 2022 +0100
acl: use uintptr_t for ace walker cookies
Avoid assuming that a pointer can fit in a uint64_t and use uintptr_t
instead.
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14131
commit 7309e94239a456de043c590ae85027e932c86f62
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Fri Oct 28 17:36:43 2022 +0100
linux isa_defs.h: Don't define _ALIGNMENT_REQUIRED
Nothing consumes this definition so stop defining it.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14128
commit 5229071ba1e6c5dbba277e50306d2ad38f417947
Author: Brooks Davis <brooks@one-eyed-alien.net>
Date: Fri Oct 28 00:58:41 2022 +0100
Improve RISC-V support
Check __riscv_xlen == 64 rather than _LP64 and define _LP64 if missing.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brooks Davis <brooks.davis@sri.com>
Closes #14128
commit da3d2666728ed21707bd66182c4077f4adcd61aa
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Tue Nov 1 16:58:17 2022 -0400
FreeBSD: Fix regression from kmem_scnprintf() in libzfs
kmem_scnprintf() is only available in libzpool. Recent buildbot issues
with showing FreeBSD results kept us from seeing this before
97143b9d314d54409244f3995576d8cc8c1ebf0a was merged.
The code has been changed to sanitize the output from `kmem_scnprintf()`.
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14111
commit fdc59cf56356858c00b9f06fd9fe11ab60ad7790
Author: Vince van Oosten <techhazard@codeforyouand.me>
Date: Sun Oct 23 11:11:58 2022 +0200
include overrides for zfs snapshot/rollback bootfs.service
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076
commit 59ca6e2ad0b40a67d83cddae8e33d95e8957ad06
Author: Vince van Oosten <techhazard@codeforyouand.me>
Date: Sun Oct 23 11:11:18 2022 +0200
include overrides for zfs-import.target
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076
commit b10f73f78eb223dd799a87474c537a69113edee1
Author: Vince van Oosten <techhazard@codeforyouand.me>
Date: Sun Oct 23 10:55:46 2022 +0200
include systemd overrides to zfs-dracut module
If a user that uses systemd and dracut wants to overide certain
settings, they typically use `systemctl edit [unit]` or place a file in
`/etc/systemd/system/[unit].d/override.conf` directly.
The zfs-dracut module did not include those overrides however, so this
did not have any effect at boot time.
For zfs-import-scan.service and zfs-import-cache.service, overrides are
now included in the dracut initramfs image.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Vince van Oosten <techhazard@codeforyouand.me>
Closes #14075
Closes #14076
commit 748b9d5bda935d126eeb62acab86c95e8b2ccac3
Author: Ryan Moeller <ryan@iXsystems.com>
Date: Tue Nov 1 15:19:32 2022 -0400
zil: Relax assertion in zil_parse
Rather than panic debug builds when we fail to parse a whole ZIL, let's
instead improve the logging of errors and continue like in a release
build.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Closes #14116
commit 95055c2ce2a51b5285091d928c8481d02796ea72
Author: youzhongyang <youzhong@gmail.com>
Date: Tue Nov 1 15:08:37 2022 -0400
ZTS: rsend_009_pos.ksh is destructive on zfs-on-root system
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
Closes #14113
commit dcce0dc5f009e8a3ec6dc48f5fc99abc4d74200f
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Mon Oct 31 13:01:04 2022 -0400
Fix oversights from 4170ae4e
4170ae4ea600fea6ac9daa8b145960c9de3915fc was intended to tackle TOCTOU
race conditions reported by CodeQL, but as an oversight, a file
descriptor was not closed and some comments were not updated.
Interestingly, CodeQL did not complain about the file descriptor leak,
so there is room for improvement in how we configure it to try to detect
this issue so that we get early warning about this.
In addition, an optimization opportunity was missed by mistake in
lib/libshare/os/linux/smb.c, which prevented us from truly closing the
TOCTOU race. This was also caught by Coverity.
Reported-by: Coverity (CID 1524424)
Reported-by: Coverity (CID 1526804)
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14109
commit b37d495e04ed6fc0012b2eccfff80af9e8887422
Author: Allan Jude <allan@klarasystems.com>
Date: Sat Oct 29 16:08:54 2022 -0400
Avoid null pointer dereference in dsl_fs_ss_limit_check()
Check for cr == NULL before dereferencing it in
dsl_enforce_ds_ss_limits() to lookup the zone/jail ID.
Reported-by: Coverity (CID 1210459)
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes #14103
commit 97143b9d314d54409244f3995576d8cc8c1ebf0a
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Thu Oct 27 14:16:04 2022 -0400
Introduce kmem_scnprintf()
`snprintf()` is meant to protect against buffer overflows, but operating
on the buffer using its return value, possibly by calling it again, can
cause a buffer overflow, because it will return how many characters it
would have written if it had enough space even when it did not. In a
number of places, we repeatedly call snprintf() by successively
incrementing a buffer offset and decrementing a buffer length, by its
return value. This is a potentially unsafe usage of `snprintf()`
whenever the buffer length is reached. CodeQL complained about this.
To fix this, we introduce `kmem_scnprintf()`, which will return 0 when
the buffer is zero or the number of written characters, minus 1 to
exclude the NULL character, when the buffer was too small. In all other
cases, it behaves like snprintf(). The name is inspired by the Linux and
XNU kernels' `scnprintf()`. The implementation was written before I
thought to look at `scnprintf()` and had a good name for it, but it
turned out to have identical semantics to the Linux kernel version.
That lead to the name, `kmem_scnprintf()`.
CodeQL only catches this issue in loops, so repeated use of snprintf()
outside of a loop was not caught. As a result, a thorough audit of the
codebase was done to examine all instances of `snprintf()` usage for
potential problems and a few were caught. Fixes for them are included in
this patch.
Unfortunately, ZED is one of the places where `snprintf()` is
potentially used incorrectly. Since using `kmem_scnprintf()` in it would
require changing how it is linked, we modify its usage to make it safe,
no matter what buffer length is used. In addition, there was a bug in
the use of the return value where the NULL format character was not
being written by pwrite(). That has been fixed.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
commit 2e08df84d8649439e5e9ed39ea13d4b755ee97c9
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Thu Oct 27 15:41:39 2022 -0400
Cleanup dump_bookmarks()
Assertions are meant to check assumptions, but the way that this
assertion is written does not check an assumption, since it is provably
always true. Removing the assertion will cause a compiler warning (made
into an error by -Werror) about printing up to 512 bytes to a 256-byte
buffer, so instead, we change the assertion to verify the assumption
that we never do a snprintf() that is truncated to avoid overrunning the
256-byte buffer.
This was caught by an audit of the codebase to look for misuse of
`snprintf()` after CodeQL reported that we had misused `snprintf()`. An
explanation of how snprintf() can be misused is here:
https://www.redhat.com/en/blog/trouble-snprintf
This particular instance did not misuse `snprintf()`, but it was caught
by the audit anyway.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
commit d71d69326116756e69b2d7bee4582f00de27ec72
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Thu Oct 27 12:45:26 2022 -0400
Fix too few arguments to formatting function
CodeQL reported that when the VERIFY3U condition is false, we do not
pass enough arguments to `spl_panic()`. This is because the format
string from `snprintf()` was concatenated into the format string for
`spl_panic()`, which causes us to have an unexpected format specifier.
A CodeQL developer suggested fixing the macro to have a `%s` format
string that takes a stringified RIGHT argument, which would fix this.
However, upon inspection, the VERIFY3U check was never necessary in the
first place, so we remove it in favor of just calling `snprintf()`.
Lastly, it is interesting that every other static analyzer run on the
codebase did not catch this, including some that made an effort to catch
such things. Presumably, all of them relied on header annotations, which
we have not yet done on `spl_panic()`. CodeQL apparently is able to
track the flow of arguments on their way to annotated functions, which
llowed it to catch this when others did not. A future patch that I have
in development should annotate `spl_panic()`, so the others will catch
this too.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
commit 4170ae4ea600fea6ac9daa8b145960c9de3915fc
Author: Richard Yao <richard.yao@alumni.stonybrook.edu>
Date: Thu Oct 27 11:03:48 2022 -0400
Fix TOCTOU race conditions reported by CodeQL and Coverity
CodeQL and Coverity both complained about:
* lib/libshare/os/linux/smb.c
* tests/zfs-tests/cmd/mmapwrite.c
* twice
* tests/zfs-tests/tests/functional/tmpfile/tmpfile_002_pos.c
* tests/zfs-tests/tests/functional/tmpfile/tmpfile_stat_mode.c
* coverity had a second complaint that CodeQL did not have
* tests/zfs-tests/cmd/suid_write_to_file.c
* Coverity had two complaints and CodeQL had one complaint, both
differed. The CodeQL complaint is about the main point of the
test, so it is not fixable without a hack involving `fork()`.
The issues reported by CodeQL are fixed, with the exception of the last
one, which is deemed to be a false positive that is too much trouble to
wrokaround. The issues reported by Coverity were only fixed if CodeQL
complained about them.
There were issues reported by Coverity in a number of other files that
were not reported by CodeQL, but fixing the CodeQL complaints is
considered a priority since we want to integrate it into a github
workflow, so the remaining Coverity complaints are left for future work.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes #14098
commit 82ad2a06ac4e379fa67ff69901a1a70c86fd8f01
Author: Brian Behlendorf <behlendorf1@llnl.gov>
Date: Fri Oct 28 13:25:37 2022 -0700
Revert "Cleanup: Delete dead code from send_merge_thread()"
This reverts commit fb823de9f due to a regression. It is in fact possible
for the range->eos_marker to be false on error.
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #14042
Closes #14104
commit 5f0a48c7c95d938e4cb0ae3ee864241b324853b7
Author: Rob N ★ <robn@despairlabs.com>
Date: Sat Oct 29 05:46:44 2022 +1100
debug: fix output from VERIFY0 assertion
The previous version reported all the right info, but the VERIFY3 name
made a little more confusing when looking for the matching location in
the source code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Rob N ★ <robn@despairlabs.com>
Closes #14099
commit 8af08a69cda63e6d7983fc2f32f9fed4155b95be
Author: Mariusz Zaborski <oshogbo@vexillium.org>
Date: Fri Oct 28 20:44:18 2022 +0200
quota: extend quota for dataset
This patch relax the quota limitation for dataset by around 3%.
What this means is that user can write more data then the quota is
set to. However thanks to that we can get more stable bandwidth, in
case when we are overwriting data in-place, and not consuming any
additional space.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
Sponsored-by: Zededa Inc.
Sponsored-by: Klara Inc.
Closes #13839
commit dc56c673e3b0d206f1d3fca66fdf5f6a46dbc4b2
Author: shodanshok <g.danti@assyoma.it>
Date: Fri Oct 28 19:21:54 2022 +0200
Fix ARC target collapse when zfs_arc_meta_limit_percent=100
Reclaim metadata when arc_available_memory < 0 even if
meta_used is not bigger than arc_meta_limit.
As described in https://github.com/openzfs/zfs/issues/14054 if
zfs_arc_meta_limit_percent=100 then ARC target can collapse to
arc_min due to arc_purge not freeing any metadata.
This patch lets arc_prune to do its work when arc_available_memory
is negative even if meta_used is not bigger than arc_meta_limit,
avoiding ARC target collapse.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
Closes #14054
Closes #14093
commit 7822b50f548e6ca73faa6f0d2de029e981be1d73
Author: vaclavskala <33496485+vaclavskala@users.noreply.github.com>
Date: Fri Oct 28 19:16:31 2022 +0200
Propagate extent_bytes change to autotrim thread
The autotrim thread only reads zfs_trim_extent_bytes_min and
zfs_trim_extent_bytes_max variable only on thread start. We
should check for parameter changes during thread execution to
allow parameter changes take effect without needing to disable
then restart the autotrim.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Václav Skála <skala@vshosting.cz>
Closes #14077
commit dbf6108b4df92341eea40d0b41792ac16eabc514
Author: Aleksa Sarai <cyphar@cyphar.com>
Date: Sat Jun 22 10:35:11 2019 +1000
zfs_rename: support RENAME_* flags
Implement support for Linux's RENAME_* flags (for renameat2). Aside from
being quite useful for userspace (providing race-free ways to exchange
paths and implement mv --no-clobber), they are used by overlayfs and are
thus required in order to use overlayfs-on-ZFS.
In order for us to represent the new renameat2(2) flags in the ZIL, we
create two new transaction types for the two flags which need
transactional-level support (RENAME_EXCHANGE and RENAME_WHITEOUT).
RENAME_NOREPLACE does not need any ZIL support because we know that if
the operation succeeded before creating the ZIL entry, there was no file
to be clobbered and thus it can be treated as a regular TX_RENAME.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #12209
Closes #14070
commit e015d6cc0b60d4675c9b6d2433eed2c8ef0863e8
Author: Aleksa Sarai <cyphar@cyphar.com>
Date: Fri Apr 26 23:23:07 2019 +1000
zfs_rename: restructure to have cleaner fallbacks
This is in preparation for RENAME_EXCHANGE and RENAME_WHITEOUT support
for ZoL, but the changes here allow for far nicer fallbacks than the
previous implementation (the source and target are re-linked in case of
the final link failing).
In addition, a small cleanup was done for the "target exists but is a
different type" codepath so that it's more understandable.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #12209
Closes #14070
commit 7b3ba296543724611c12c52c18e85a1028f8f19e
Author: Aleksa Sarai <cyphar@cyphar.com>
Date: Wed May 18 20:29:33 2022 +1000
debug: add VERIFY_{IMPLY,EQUIV} variants
This allows for much cleaner VERIFY-level assertions.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Closes #14070
commit 86db35c447aa3f4cc848497d78d54ec9c985d1ed
Author: Pavel Snajdr <snajpa@snajpa.net>
Date: Thu Dec 5 01:52:27 2019 +0100
Remove zpl_revalidate: fix snapshot rollback
Open files, which aren't present in the snapshot, which is being
roll-backed to, need to disappear from the visible VFS image of
the dataset.
Kernel provides d_drop function to drop invalid entry from
the dcache, but inode can be referenced by dentry multiple dentries.
The introduced zpl_d_drop_aliases function walks and invalidates
all aliases of an inode.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Closes #9600
Closes #14070
New Features - Block cloning (#13392) - Linux container support (#14070, #14097, #12263) - Scrub error log (#12812, #12355) - BLAKE3 checksums (#12918) - Corrective "zfs receive" - Vdev and zpool user properties Performance - Fully adaptive ARC (#14359) - SHA2 checksums (#13741) - Edon-R checksums (#13618) - Zstd early abort (#13244) - Prefetch improvements (#14603, #14516, #14402, #14243, #13452) - General optimization (#14121, #14123, #14039, #13680, #13613, #13606, #13576, #13553, #12789, #14925, #14948) Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
New features: - Fully adaptive ARC eviction (openzfs#14359) - Block cloning (openzfs#13392) - Scrub error log (openzfs#12812, openzfs#12355) - Linux container support (openzfs#14070, openzfs#14097, openzfs#12263) - BLAKE3 Checksums (openzfs#12918) - Corrective "zfs receive" (openzfs#9372) Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Youzhong Yang yyang@mathworks.com
Motivation and Context
fs.idmapped.v5.17 enables "the idmapping infrastructure to support idmapped mounts of filesystems mounted with an idmapping". We need to improve the idmapped mount support in ZFS too.
Description
This pull request contains the following changes:
How Has This Been Tested?
Full test suite run in a Ubuntu 22.04 VM; manual run of xfstests idmapped mount test cases.
Types of changes
Checklist:
Signed-off-by.