Skip to content

Fix beegfs-client stop stuck when management server is down #72

@jteng2127

Description

@jteng2127

Is your feature request related to a problem? Please describe.

When the management server is down, running umount ${mnt} hang indefinitely, causing the systemctl restart beegfs-client incorrectly reported success.

Here is the process to reproduce it:

  1. systemctl start beegfs-mgmtd start remote management server
  2. systemctl start beegfs-client (which calls /opt/beegfs/sbin/beegfs-client start)
  3. systemctl stop beegfs-mgmtd stop remote management server
  4. systemctl restart beegfs-client (which calls /opt/beegfs/sbin/beegfs-client stop and /opt/beegfs/sbin/beegfs-client start)
    /opt/beegfs/sbin/beegfs-client stop is killed due to timeout, getting stuck at this line:

    Here is the systemd log when restarting the client:
    Aug 21 10:30:11 pc systemd[1]: Stopping Start BeeGFS Client...
    Aug 21 10:30:11 pc beegfs-client[58947]: Shutting down BeeGFS Client:
    Aug 21 10:30:11 pc beegfs-client[58947]: - Unmounting directories from /etc/beegfs/beegfs-mounts.conf
    Aug 21 10:31:41 pc systemd[1]: beegfs-client.service: Stopping timed out. Terminating.
    Aug 21 10:31:41 pc systemd[1]: beegfs-client.service: Control process exited, code=killed, status=15/TERM
    Aug 21 10:31:41 pc systemd[1]: beegfs-client.service: Failed with result 'timeout'.
    Aug 21 10:31:41 pc systemd[1]: Stopped Start BeeGFS Client.
    Aug 21 10:31:41 pc systemd[1]: Starting Start BeeGFS Client...
    Aug 21 10:31:41 pc beegfs-client[58986]: Starting BeeGFS Client:
    Aug 21 10:31:41 pc beegfs-client[58986]: - Loading BeeGFS modules
    Aug 21 10:31:41 pc beegfs-client[58986]: - Mounting directories from /etc/beegfs/beegfs-mounts.conf
    Aug 21 10:31:41 pc systemd[1]: Finished Start BeeGFS Client.
    
  5. After that, systemctl status beegfs-client reports SUCCESS, because stop doesn't unmount successfully, and start skip the mount if it already existed
    mount -t beegfs | grep "${mnt} " >/dev/null 2>&1
    if [ $? -eq 0 ]; then
    # already mounted
    continue
    fi

Describe the solution you'd like

Changing this line to use umount -l (lazy unmount) should work, but I'm not sure if it will break something

Describe alternatives you've considered

  • umount -f can also do force unmount, but it might crash other process.
  • use longer TimeoutStopSec to client service.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestnewIssues that haven't been triaged yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions