Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-shutdown hangs on containerd-shim when k3s-agent running #2400

Closed
sourcedelica opened this issue Oct 16, 2020 · 26 comments
Closed

systemd-shutdown hangs on containerd-shim when k3s-agent running #2400

sourcedelica opened this issue Oct 16, 2020 · 26 comments

Comments

@sourcedelica
Copy link

sourcedelica commented Oct 16, 2020

Environmental Info:
K3s Version: k3s version v1.18.6+k3s1 (6f56fa1)

Node(s) CPU architecture, OS, and Version: x86_64 Ubuntu 20.04.1
Linux nuc-linux3 5.4.0-48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: 1 master 2 workers

Describe the bug:
When shutting down or rebooting the node, the shutdown hangs for approximately 90 seconds. The console message is

systemd-shutdown: waiting for process: containerd-shim

When researching the problem I landed on this issue: ddev/ddev#2538 (comment) where they said when they uninstalled k3s the problem went away. I disabled and stopped k3s-agent.service and rebooted and the problem also went away for me.

I also tried re-enabling and starting k3s-agent.service and removing the docker.io package and running apt autoremove to remove containerd, runc, etc. but it still hangs on reboot at the same place.

@sourcedelica
Copy link
Author

Following containerd/containerd#386 (comment) I changed the service configuration for k3s.agent and k3s-agent.service to KillMode=Mixed and that fixed the problem. This is in the standard Docker configuration.

However, I also found #1965 where it looks like this behavior is as intended. Is there a way to allow for upgrading k3s without disrupting workloads but at the same time not hang shutdowns/reboots for 90s?

@sourcedelica
Copy link
Author

I was thinking one way to do it is to use KillMode=mixed or KillMode=control-group by default for k3s{-agent}.service and when doing an upgrade, add a drop-in in /run/systemd/system/k3s{-agent}.service.d that temporarily sets KillMode=process before stopping the service, then removes the drop-in after the upgrade.

@dontlaugh
Copy link

systemd has an explicit pre-shutdown hook, so perhaps you could invoke special logic with that. See:

/usr/lib/systemd/system/shutdown.target.wants

@stale
Copy link

stale bot commented Sep 21, 2021

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label Sep 21, 2021
@unixfox
Copy link

unixfox commented Sep 21, 2021

Bump still relevant

@stale stale bot removed the status/stale label Sep 21, 2021
@senpaiSubby
Copy link

bump same issue

@andrewchen5678
Copy link

notice the issue with raspberry pi with display on

@jraby
Copy link

jraby commented Jan 16, 2022

I faced this issue yesterday and ended up with the following solution.

/etc/systemd/system/cgroup-kill-on-shutdown@.service :

[Unit]
Description=Kill cgroup procs on shutdown for %i
DefaultDependencies=false
Before=shutdown.target umount.target
[Service]
# Instanced units are not part of system.slice for some reason
# without this, the service isn't started at shutdown
Slice=system.slice
ExecStart=/bin/bash -c 'pids=$(cat /sys/fs/cgroup/unified/system.slice/%i/cgroup.procs); echo $pids | xargs -r kill;'                                                                                                                                                                      
ExecStart=/bin/sleep 5                                                                                                                                                                                                                                                                     
ExecStart=/bin/bash -c 'pids=$(cat /sys/fs/cgroup/unified/system.slice/%i/cgroup.procs); echo $pids | xargs -r kill -9;'
Type=oneshot
[Install]
WantedBy=shutdown.target

Enable the "service" for k3s-agent.service (will also work for k3s on the master ):

sudo systemctl enable cgroup-kill-on-shutdown@k3s-agent.service.service

# or, on the master:  sudo systemctl enable cgroup-kill-on-shutdown@k3s.service.service

I've written a long winding explanation here but in brief, what happens is that since killmode=process is used, all the container processes end up staying alive when k3s is brought down. Which is a good thing ™️

However, during shutdown, systemd will signal all remaining processes and wait for DefaultTimeoutStopSec for them to die.
This is always 90s during the last shutdown phase with systemd v245.
It is a bug in systemd v245 shipped with ubuntu 20.04 and was fixed in september 2020

What I used to do was to set DefaultTimeoutStopSec=5s in /etc/systemd/system.conf and it worked fine, but on ubuntu 20.04 it doesn't.

Since there's little chance this fix will make it back into 20.04, the above "service" will perform round of SIGTERM, wait 5s, then proceed with SIGKILL to finish k3s's process cleanup during shutdown.
The sleep can be tweaked to suit your services need (something matching terminationGracePeriod perhaps)

Hope it helps.

@sourcedelica
Copy link
Author

Awesome research!

@miraculixx
Copy link

miraculixx commented Jan 21, 2022

@jraby your solution helped me to resolve the issue, however I ended up using the k3s-killall.sh according to the k3s docs . With this there is no shutdown delay on my system.

Caution - this may not be what you want

The killall script cleans up containers, K3s directories, and networking components while also removing the iptables chain with all the associated rules. The cluster data will not be deleted.

I'm using this /etc/systemd/system/cgroup-kill-on-shutdown@.service

# source https://github.com/k3s-io/k3s/issues/2400#issuecomment-1013798094
# $ sudo systemctl enable cgroup-kill-on-shutdown@k3s-agent.service.service
[Unit]
Description=Kill cgroup procs on shutdown for %i
DefaultDependencies=false
Before=shutdown.target umount.target
[Service]
# Instanced units are not part of system.slice for some reason
# without this, the service isn't started at shutdown
Slice=system.slice
ExecStart=/bin/bash -c "/usr/local/bin/k3s-killall.sh"
Type=oneshot
[Install]
WantedBy=shutdown.target

This is on

Linux Mint 20.2
5.4.0-91-generic

@horihel
Copy link

horihel commented Feb 15, 2022

the same problem exists on rke2 (no surprise, given its roots are in k3s)

@brandond
Copy link
Member

Yes, this is by design. Stopping the K3s (or RKE2) service does not stop running containers. This is to allow for nondisruptive upgrades of the main K3s/RKE2 components by simply replacing the binary and restarting the service.

@horihel
Copy link

horihel commented Feb 16, 2022

would you accept a feature request to add a systemd unit like #2400 (comment) which only triggers on shutdown?
This would both allow the intended behaviour of k3s/rke2 (seamless updates/restarts) and allow for a shutdown/reboot that's even quicker than RKE1.

here's my non-instanced version of that (for rke2):

[Unit]
Description=Kill containerd-shims on shutdown
DefaultDependencies=false
Before=shutdown.target umount.target

[Service]
ExecStart=/bin/bash -c "/usr/local/bin/rke2-killall.sh"
Type=oneshot

[Install]
WantedBy=shutdown.target

@brandond
Copy link
Member

That might be a good thing to add to the documentation, for folks that want it?

@ciacon
Copy link

ciacon commented Feb 22, 2022

Confirming this behaviour to be present with:

root@k3s:~# k3s --version
k3s version v1.23.3+k3s1 (5fb370e5)
go version go1.17.5
root@k3s:~# uname -a
Linux k3s 5.4.0-100-generic #113-Ubuntu SMP Thu Feb 3 18:43:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

@brandond
Copy link
Member

brandond commented Feb 22, 2022

@ciacon this is not version-specific behavior. As described at #2400 (comment) by design, pods are not stopped when the k3s process exits.

@hlacikd
Copy link

hlacikd commented Aug 30, 2022

in releases prior to 1.23.7 it was enough to add KillMode=mixed to /etc/systemd/system/k3s.service , and when system shutdown executed, k3s killed containers and computer was turned off imediately

For some reason unknown to me since 1.23.8 ---> up to current one 1.24.4 when doing so, it takes 90s again to shutdown system with k3s (... which is default systemctl timeout TimeoutStopUSec=1min 30s ... ), so KillMode mixed is ignored and k3s waits until timeout has passed to kill them ....

what has changed?

[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=mixed

@brandond
Copy link
Member

Probably something related to the containerd version change? I'm not sure, since changing the KillMode isn't something we test or support. I would recommend adding another unit that runs on shutdown, as described above.

@hlacikd
Copy link

hlacikd commented Sep 6, 2022

Probably something related to the containerd version change? I'm not sure, since changing the KillMode isn't something we test or support. I would recommend adding another unit that runs on shutdown, as described above.

Thanks I have implemented shutdown unit as described by @horihel several days ago and so far it works great.

May I vote for adding this to official documentation @brandond ? I believe it is pretty common scenario, since k3s is ideal for edge deployments, and usually edge devices get much more shutdowns then servers usually do.

@MountComb
Copy link

Here is a k3s version of #2400 (comment):

[Unit]
Description=Kill containerd-shims on shutdown
DefaultDependencies=false
Before=shutdown.target umount.target

[Service]
ExecStart=/usr/local/bin/k3s-killall.sh
Type=oneshot

[Install]
WantedBy=shutdown.target

Put the file to /etc/systemd/system/shutdown-k3s.service and then enable the service using

systemctl enable shutdown-k3s.service

Also note that this service name shutdown-k3s shall not start with k3s-, otherwise the k3s-killall.sh script would try to stop it and cause problems.

@samip5
Copy link

samip5 commented Dec 5, 2022

I faced this issue yesterday and ended up with the following solution.

/etc/systemd/system/cgroup-kill-on-shutdown@.service :

[Unit]
Description=Kill cgroup procs on shutdown for %i
DefaultDependencies=false
Before=shutdown.target umount.target
[Service]
# Instanced units are not part of system.slice for some reason
# without this, the service isn't started at shutdown
Slice=system.slice
ExecStart=/bin/bash -c 'pids=$(cat /sys/fs/cgroup/unified/system.slice/%i/cgroup.procs); echo $pids | xargs -r kill;'                                                                                                                                                                      
ExecStart=/bin/sleep 5                                                                                                                                                                                                                                                                     
ExecStart=/bin/bash -c 'pids=$(cat /sys/fs/cgroup/unified/system.slice/%i/cgroup.procs); echo $pids | xargs -r kill -9;'
Type=oneshot
[Install]
WantedBy=shutdown.target

Enable the "service" for k3s-agent.service (will also work for k3s on the master ):

sudo systemctl enable cgroup-kill-on-shutdown@k3s-agent.service.service

# or, on the master:  sudo systemctl enable cgroup-kill-on-shutdown@k3s.service.service

I've written a long winding explanation here but in brief, what happens is that since killmode=process is used, all the container processes end up staying alive when k3s is brought down. Which is a good thing tm

However, during shutdown, systemd will signal all remaining processes and wait for DefaultTimeoutStopSec for them to die. This is always 90s during the last shutdown phase with systemd v245. It is a bug in systemd v245 shipped with ubuntu 20.04 and was fixed in september 2020

What I used to do was to set DefaultTimeoutStopSec=5s in /etc/systemd/system.conf and it worked fine, but on ubuntu 20.04 it doesn't.

Since there's little chance this fix will make it back into 20.04, the above "service" will perform round of SIGTERM, wait 5s, then proceed with SIGKILL to finish k3s's process cleanup during shutdown. The sleep can be tweaked to suit your services need (something matching terminationGracePeriod perhaps)

Hope it helps.

This will only work with unified cgroups though as for example I don't have /sys/fs/cgroup/unified/system.slice/ to begin with. :(

$ mount | grep group
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
none on /run/cilium/cgroupv2 type cgroup2 (rw,relatime)

Ubuntu 22.04

@damonmaria
Copy link

damonmaria commented Mar 9, 2023

I had trouble getting a shutdown service to behave, but it turns out that was because I changed the [Install] section of the service and a systemctl daemon-reload if not enough to apply that change. You actually need to disable and enable the service to get systemd to update the symlinks to the new target.

@brandond
Copy link
Member

brandond commented Mar 9, 2023

Yes, good catch. You will need to adapt the example for agent nodes. The server and agent use different service names.

@samip5
Copy link

samip5 commented Mar 10, 2023

Before needs to change to k3s-agent.service on agent nodes.

Unless of course one uses k3s ansible role which names them both as k3s.service. :)

@kub3let
Copy link

kub3let commented Apr 13, 2023

Here is a k3s version of #2400 (comment):

[Unit]
Description=Kill containerd-shims on shutdown
DefaultDependencies=false
Before=shutdown.target umount.target

[Service]
ExecStart=/usr/local/bin/k3s-killall.sh
Type=oneshot

[Install]
WantedBy=shutdown.target

Put the file to /etc/systemd/system/shutdown-k3s.service and then enable the service using

systemctl enable shutdown-k3s.service

Also note that this service name shutdown-k3s shall not start with k3s-, otherwise the k3s-killall.sh script would try to stop it and cause problems.


Can confirm this also works if you get the message A stop job is running for libcontainer...

Make sure to drain the node before shutdown, otherwise there will be data loss.

If you use the k3s ansible role you need to extract k3s-killall.sh from

k3s/install.sh

Lines 666 to 743 in d9f40d4

#!/bin/sh
[ $(id -u) -eq 0 ] || exec sudo $0 $@
for bin in /var/lib/rancher/k3s/data/**/bin/; do
[ -d $bin ] && export PATH=$PATH:$bin:$bin/aux
done
set -x
for service in /etc/systemd/system/k3s*.service; do
[ -s $service ] && systemctl stop $(basename $service)
done
for service in /etc/init.d/k3s*; do
[ -x $service ] && $service stop
done
pschildren() {
ps -e -o ppid= -o pid= | \
sed -e 's/^\s*//g; s/\s\s*/\t/g;' | \
grep -w "^$1" | \
cut -f2
}
pstree() {
for pid in $@; do
echo $pid
for child in $(pschildren $pid); do
pstree $child
done
done
}
killtree() {
kill -9 $(
{ set +x; } 2>/dev/null;
pstree $@;
set -x;
) 2>/dev/null
}
getshims() {
ps -e -o pid= -o args= | sed -e 's/^ *//; s/\s\s*/\t/;' | grep -w 'k3s/data/[^/]*/bin/containerd-shim' | cut -f1
}
killtree $({ set +x; } 2>/dev/null; getshims; set -x)
do_unmount_and_remove() {
set +x
while read -r _ path _; do
case "$path" in $1*) echo "$path" ;; esac
done < /proc/self/mounts | sort -r | xargs -r -t -n 1 sh -c 'umount "$0" && rm -rf "$0"'
set -x
}
do_unmount_and_remove '/run/k3s'
do_unmount_and_remove '/var/lib/rancher/k3s'
do_unmount_and_remove '/var/lib/kubelet/pods'
do_unmount_and_remove '/var/lib/kubelet/plugins'
do_unmount_and_remove '/run/netns/cni-'
# Remove CNI namespaces
ip netns show 2>/dev/null | grep cni- | xargs -r -t -n 1 ip netns delete
# Delete network interface(s) that match 'master cni0'
ip link show 2>/dev/null | grep 'master cni0' | while read ignore iface ignore; do
iface=${iface%%@*}
[ -z "$iface" ] || ip link delete $iface
done
ip link delete cni0
ip link delete flannel.1
ip link delete flannel-v6.1
ip link delete kube-ipvs0
ip link delete flannel-wg
ip link delete flannel-wg-v6
rm -rf /var/lib/cni/
iptables-save | grep -v KUBE- | grep -v CNI- | grep -iv flannel | iptables-restore
ip6tables-save | grep -v KUBE- | grep -v CNI- | grep -iv flannel | ip6tables-restore

@caroline-suse-rancher
Copy link
Contributor

Converting this issue into a discussion as this behavior is by design.

@k3s-io k3s-io locked and limited conversation to collaborators Apr 26, 2023
@caroline-suse-rancher caroline-suse-rancher converted this issue into discussion #7362 Apr 26, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
Archived in project
Development

No branches or pull requests