Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed umount error during shutdown #480

Merged
merged 3 commits into from
May 23, 2023

Conversation

chrisho
Copy link
Contributor

@chrisho chrisho commented Apr 20, 2023

Problem:
harvester/harvester#1876

Solution:
Create a rke2-shutdown.service to run rke2-kill-containers.sh to stop all the running pods (containers) before the system shutdown or reboots.

The script executes the systemd stop(--kill-who=all) container unit in parallel. Systemd sends a SIGTERM signal to the process first, and if there is no response within the timeout, it will send a SIGKILL signal to the process.

Through testing, there has been a significant improvement in ext4 errors during container unmounting, but some ext4 errors still occur, which are not fundamentally related to the container program. The following is the ext4 log recorded during the reboot:

Apr 19 09:40:40 localhost dracut-initqueue[871]: The superblock could not be read or does not describe a valid ext2/ext3/ext4
Apr 19 09:40:40 localhost dracut-initqueue[871]: filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
Apr 19 09:40:41 localhost kernel: EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem
Apr 19 09:52:49 node1 kernel: EXT4-fs warning (device sda): ext4_end_bio:348: I/O error 7 writing to inode 2621451 starting block 1740004)
Apr 19 09:52:49 node1 kernel: EXT4-fs warning (device sda): ext4_end_bio:348: I/O error 7 writing to inode 2621451 starting block 1740008)
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sda) in ext4_reserve_inode_write:5834: Journal has aborted
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sda): ext4_convert_unwritten_extents:4825: inode #2621451: comm kworker/u24:3: mark_inode_dirty error
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sda) in ext4_convert_unwritten_io_end_vec:4864: Journal has aborted
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sdb) in ext4_orphan_add:3211: Journal has aborted
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sdb) in ext4_create:2822: Journal has aborted
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sdb): ext4_journal_check_start:83: comm alertmanager: Detected aborted journal
Apr 19 09:52:55 node1 kernel: EXT4-fs error (device sdc): ext4_put_super:1187: comm umount: Couldn't clean up the journal

Related Issue:
harvester/harvester#1876
rancher/rke2#2411 (comment)

Test plan:

  1. Build a new image and install
  2. Shutdown/Reboot the harvester node
  3. check console logs

Copy link
Contributor

@guangbochen guangbochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me as the EXT4-fs error has yet to be seen so far.

Broadcast message from root@harv-new on pts/0 (Thu 2023-05-18 03:58:04 UTC):

The system is going down for poweroff at Thu 2023-05-18 03:59:04 UTC!

May 18 03:58:04 harv-new sudo[29882]:     root : TTY=pts/0 ; PWD=/root ; USER=root ; COMMAND=/usr/sbin/shutdown
May 18 03:58:04 harv-new sudo[29882]: pam_unix(sudo:session): session opened for user root by rancher(uid=0)
May 18 03:58:04 harv-new systemd-logind[1628]: Creating /run/nologin, blocking further logins...
May 18 03:58:04 harv-new sudo[29882]: pam_unix(sudo:session): session closed for user root
May 18 03:58:24 harv-new rancher-system-agent[26512]: time="2023-05-18T03:58:24Z" level=info msg="[Applyinator] No image provided, creating empty working directory /var/lib/rancher/agent/work/20230518-035824/7524ce40115e7a1a9eab054d6bdfbb5ec7b2d37e242a42a6358037f110a6a3a7_0"

Broadcast message from root@harv-new on pts/0 (Thu 2023-05-18 03:59:04 UTC):

The system is going down for poweroff NOW!


Session terminated, killing shell... ...killed.
Terminated

Copy link
Contributor

@futuretea futuretea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks

Copy link
Contributor

@Vicente-Cheng Vicente-Cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Vicente-Cheng Vicente-Cheng merged commit 746783b into harvester:master May 23, 2023
@bk201 bk201 mentioned this pull request Apr 23, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants