Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cephfs: wait for FUSE to exit after unmount #233

Merged
merged 2 commits into from
Feb 27, 2019

Conversation

gman0
Copy link
Contributor

@gman0 gman0 commented Feb 26, 2019

The symptoms shown in this comment. Once ceph-fuse mounts a volume, it forks the process who's kept running until the volume is unmounted. Since it's a child of the csi plugin, after it exits, it becomes a zombie process and needs to be Wait()-ed for.

This PR reads PID of the FUSE daemon from the initial ceph-fuse exec and stores it for later in a {volume ID}->{ceph-fuse PID} map. This is then used in unmountVolume() which, once the FUSE volume is unmounted, waits till the FUSE daemon (which is now a zombie) exits and is finally cleaned from kernel's process table.

@gman0
Copy link
Contributor Author

gman0 commented Feb 26, 2019

PTAL @rootfs
@Madhu-1 this one fixes those <defunct> ceph-fuse processes you've reported in #191 . There's even more serious issue with ceph-fuse crashing but this PR at least takes care of those zombies.

@@ -116,10 +124,36 @@ func mountFuse(mountPoint string, cr *credentials, volOptions *volumeOptions, vo
return err
}

if !bytes.Contains(stderr, []byte("starting fuse")) {
// Parse the output:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide a sample output? It helps understand the parsing below. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't output anything unfortunately, but i can sketch it out:

this is the stderr output from ceph-fuse

2019-02-26 21:30:52.556 7f0fb5589c00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2019-02-26 21:30:52.568 7f0fb5589c00 -1 init, newargv = 0x1339e30 newargc=9
ceph-fuse[274]: starting ceph client
2019-02-26 21:30:52.568 7f0fb5589c00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
ceph-fuse[274]: starting fuse
         ^   ^  ^
         |   |  |
----------------+ idx points to 's'
         |   |
-------------+ pidEnd points to ']'
         |
---------+ pidStart points to '['

First, we need to search for starting fuse string to make sure the mount is ok. The PID of the daemon is delimited by [ and ]: since we already have the starting position of that string in idx, and we know what the output should look like, we can start searching for the right bracket, starting at the end of the string (as opposed to starting the search at the beginning of the output - this would be less "optimal"). Once we have the position of the right bracket in pidEnd, the same thing is repeated for the left bracket [, whose position is stored in pidStart. The PID should be in between pidStart and pidEnd, i.e. stderr[pidStart+1 : pidEnd]

Copy link
Member

@rootfs rootfs Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uff... this is a nice case of over-complicating things, isn't it? :P I'll send a fix

@gman0
Copy link
Contributor Author

gman0 commented Feb 27, 2019

PTAL @rootfs

@rootfs rootfs merged commit d938944 into ceph:csi-v1.0 Feb 27, 2019
wilmardo pushed a commit to wilmardo/ceph-csi that referenced this pull request Jul 29, 2019
cephfs: wait for FUSE to exit after unmount
nixpanic pushed a commit to nixpanic/ceph-csi that referenced this pull request Mar 4, 2024
Syncing latest changes from upstream devel for ceph-csi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants