Fixed umount error during shutdown #480

chrisho · 2023-04-20T07:07:31Z

Solution:
Create a rke2-shutdown.service to run rke2-kill-containers.sh to stop all the running pods (containers) before the system shutdown or reboots.

The script executes the systemd stop(--kill-who=all) container unit in parallel. Systemd sends a SIGTERM signal to the process first, and if there is no response within the timeout, it will send a SIGKILL signal to the process.

Through testing, there has been a significant improvement in ext4 errors during container unmounting, but some ext4 errors still occur, which are not fundamentally related to the container program. The following is the ext4 log recorded during the reboot:

Apr 19 09:40:40 localhost dracut-initqueue[871]: The superblock could not be read or does not describe a valid ext2/ext3/ext4
Apr 19 09:40:40 localhost dracut-initqueue[871]: filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
Apr 19 09:40:41 localhost kernel: EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem
Apr 19 09:52:49 node1 kernel: EXT4-fs warning (device sda): ext4_end_bio:348: I/O error 7 writing to inode 2621451 starting block 1740004)
Apr 19 09:52:49 node1 kernel: EXT4-fs warning (device sda): ext4_end_bio:348: I/O error 7 writing to inode 2621451 starting block 1740008)
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sda) in ext4_reserve_inode_write:5834: Journal has aborted
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sda): ext4_convert_unwritten_extents:4825: inode #2621451: comm kworker/u24:3: mark_inode_dirty error
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sda) in ext4_convert_unwritten_io_end_vec:4864: Journal has aborted
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sdb) in ext4_orphan_add:3211: Journal has aborted
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sdb) in ext4_create:2822: Journal has aborted
Apr 19 09:52:49 node1 kernel: EXT4-fs error (device sdb): ext4_journal_check_start:83: comm alertmanager: Detected aborted journal
Apr 19 09:52:55 node1 kernel: EXT4-fs error (device sdc): ext4_put_super:1187: comm umount: Couldn't clean up the journal

Related Issue:
harvester/harvester#1876
rancher/rke2#2411 (comment)

Test plan:

Build a new image and install
Shutdown/Reboot the harvester node
check console logs

guangbochen

This works for me as the EXT4-fs error has yet to be seen so far.

Broadcast message from root@harv-new on pts/0 (Thu 2023-05-18 03:58:04 UTC):

The system is going down for poweroff at Thu 2023-05-18 03:59:04 UTC!

May 18 03:58:04 harv-new sudo[29882]:     root : TTY=pts/0 ; PWD=/root ; USER=root ; COMMAND=/usr/sbin/shutdown
May 18 03:58:04 harv-new sudo[29882]: pam_unix(sudo:session): session opened for user root by rancher(uid=0)
May 18 03:58:04 harv-new systemd-logind[1628]: Creating /run/nologin, blocking further logins...
May 18 03:58:04 harv-new sudo[29882]: pam_unix(sudo:session): session closed for user root
May 18 03:58:24 harv-new rancher-system-agent[26512]: time="2023-05-18T03:58:24Z" level=info msg="[Applyinator] No image provided, creating empty working directory /var/lib/rancher/agent/work/20230518-035824/7524ce40115e7a1a9eab054d6bdfbb5ec7b2d37e242a42a6358037f110a6a3a7_0"

Broadcast message from root@harv-new on pts/0 (Thu 2023-05-18 03:59:04 UTC):

The system is going down for poweroff NOW!


Session terminated, killing shell... ...killed.
Terminated

package/harvester-os/files/etc/systemd/system/rke2-shutdown.service

package/harvester-os/files/usr/sbin/rke2-kill-containers.sh

futuretea

LGTM, Thanks

package/harvester-os/files/usr/sbin/rke2-kill-containers.sh

Vicente-Cheng

LGTM, thanks!

fixed umount error during shutdown

4215ab7

chrisho requested review from bk201 and guangbochen April 20, 2023 07:08

chrisho mentioned this pull request Apr 20, 2023

[BUG] Shutdown sequence is unexpected. Umount overlayfs before stop containers harvester/harvester#1876

Closed

chrisho marked this pull request as ready for review April 20, 2023 08:23

chrisho requested a review from futuretea April 20, 2023 08:24

guangbochen requested review from Vicente-Cheng and removed request for futuretea April 24, 2023 06:15

guangbochen requested a review from futuretea May 10, 2023 07:28

guangbochen reviewed May 18, 2023

View reviewed changes

package/harvester-os/files/etc/systemd/system/rke2-shutdown.service Outdated Show resolved Hide resolved

futuretea reviewed May 18, 2023

View reviewed changes

package/harvester-os/files/usr/sbin/rke2-kill-containers.sh Outdated Show resolved Hide resolved

update code

2d56fc7

chrisho force-pushed the kill-rke2-containers branch from 6627884 to 2d56fc7 Compare May 19, 2023 10:41

chrisho requested review from guangbochen and futuretea May 19, 2023 10:45

futuretea approved these changes May 19, 2023

View reviewed changes

guangbochen approved these changes May 19, 2023

View reviewed changes

Vicente-Cheng reviewed May 23, 2023

View reviewed changes

package/harvester-os/files/usr/sbin/rke2-kill-containers.sh Outdated Show resolved Hide resolved

remove --kill-who flags

708630c

chrisho requested a review from Vicente-Cheng May 23, 2023 07:06

bk201 approved these changes May 23, 2023

View reviewed changes

Vicente-Cheng approved these changes May 23, 2023

View reviewed changes

Vicente-Cheng merged commit 746783b into harvester:master May 23, 2023

bk201 mentioned this pull request Apr 23, 2024

Run "rke2-killall.sh" before shutdown #270

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed umount error during shutdown #480

Fixed umount error during shutdown #480

chrisho commented Apr 20, 2023 •

edited

Loading

guangbochen left a comment

futuretea left a comment

Vicente-Cheng left a comment

Fixed umount error during shutdown #480

Fixed umount error during shutdown #480

Conversation

chrisho commented Apr 20, 2023 • edited Loading

guangbochen left a comment

Choose a reason for hiding this comment

futuretea left a comment

Choose a reason for hiding this comment

Vicente-Cheng left a comment

Choose a reason for hiding this comment

chrisho commented Apr 20, 2023 •

edited

Loading