-
Notifications
You must be signed in to change notification settings - Fork 176
Fix race between getf() and areleasef() #492
Conversation
If a vnode is released asynchronously through areleasef(), it is possible for the user process to reuse the file descriptor before areleasef is called. When this happens, getf() will return a stale reference, any operations in the kernel on that file descriptor will fail (as it is closed) and the operations meant for that fd will never occur from userspace's perspective. We correct this by detecting this condition in getf(), doing a putf on the old file handle, updating the file descriptor and proceeding as if everything was fine. When the areleasef() is done, it will harmlessly decrement the reference counter on the Illumos file handle. Signed-off-by: Richard Yao <ryao@gentoo.org>
I don't understand why we even need this. Why don't we just fget(fd) every time instead of maintaining a separate file list. |
@tuxoko That would introduce another issue in semantics. Specifically, userspace could call I do not think it is possible to close the underlying race completely due to a difference in |
@ryao Edit: |
@tuxoko This patch ensures that you will get the correct file handle in the case that Also, the file list can both grow and shrink. Once |
But that's because you call fget(fd), not becasue of the file list. I'm not arguing your patch is wrong. I'm arguing we don't need the file list. |
@tuxoko There is no way to tell if anything leaked when the module is unloaded without the list. It also potentially allows us to save memory by avoiding duplicate SPL handles for the same Linux file handle. |
@ryao |
@tuxoko We do this for the kmem cache code as well. I suspect that this is done because the buildbot does not use kmemleak. |
Actually I had something a little more drastic in mind. I'd like to remove all of the vnode and kobj code from the SPL and add a simplified version to the ZFS source tree. It's served it's purpose well, but frankly it's a bit of a mess and burden. @ryao @tuxoko I've opened the following pull requests with what I consider a rough first draft of what I'm proposing. It builds but hasn't seen any real testing. If you have time to provide feedback that would be great! As for this specific bug I happened to address it in the context of these changes. What do you think of openzfs/zfs@25e56b4aa7c95c6. Basically I just dumped this terrible confusing compatibility code. |
@behlendorf |
As for the buildbot it just so happens that I've largely finished updating it and migrating it to ec2. The updated buildbot is a lot more flexible and I think will be far more useful to us. The entire ZoL buildbot configuration is available at documented at https://github.com/zfsonlinux/zfs-buildbot/. This way anyone can help improve it and exactly how it works is clearly visible. |
@behlendorf It only affects the refresh I am preparing to do on openzfs/zfs#3907, which is where this bug occurred. That pull request is the only consumer of I am working on refreshing that pull request right now as I believe it is in a state where nearly all of the outstanding comments have been addressed. |
OK, then let's get your fix applied to the SPL. That will make it easier to get any upstream changes applied. Then we'll revisit updating the code in such a way that it can be supported easily on both Illumos and Linux. @ryao if your happy with this as the final version of this patch let me know and I'll merge it. |
@behlendorf I am happy with it. |
Merged as: 1683e75 Fix race between getf() and areleasef() |
If a vnode is released asynchronously through areleasef(), it is
possible for the user process to reuse the file descriptor before
areleasef is called. When this happens, getf() will return a stale
reference, any operations in the kernel on that file descriptor will
fail (as it is closed) and the operations meant for that fd will
never occur from userspace's perspective.
We correct this by detecting this condition in getf(), doing a putf on the old
file handle, updating the file descriptor and proceeding as if everything was
fine. When the areleasef() is done, it will harmlessly decrement the reference
counter on the Illumos file handle.
Signed-off-by: Richard Yao ryao@gentoo.org