systemd-shutdown hangs on containerd-shim when k3s-agent running #7362
Replies: 26 comments
-
Following containerd/containerd#386 (comment) I changed the service configuration for However, I also found #1965 where it looks like this behavior is as intended. Is there a way to allow for upgrading k3s without disrupting workloads but at the same time not hang shutdowns/reboots for 90s? |
Beta Was this translation helpful? Give feedback.
-
I was thinking one way to do it is to use |
Beta Was this translation helpful? Give feedback.
-
systemd has an explicit pre-shutdown hook, so perhaps you could invoke special logic with that. See:
|
Beta Was this translation helpful? Give feedback.
-
This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions. |
Beta Was this translation helpful? Give feedback.
-
Bump still relevant |
Beta Was this translation helpful? Give feedback.
-
bump same issue |
Beta Was this translation helpful? Give feedback.
-
notice the issue with raspberry pi with display on |
Beta Was this translation helpful? Give feedback.
-
I faced this issue yesterday and ended up with the following solution.
Enable the "service" for
I've written a long winding explanation here but in brief, what happens is that since However, during shutdown, systemd will signal all remaining processes and wait for What I used to do was to set Since there's little chance this fix will make it back into 20.04, the above "service" will perform round of SIGTERM, wait 5s, then proceed with SIGKILL to finish k3s's process cleanup during shutdown. Hope it helps. |
Beta Was this translation helpful? Give feedback.
-
Awesome research! |
Beta Was this translation helpful? Give feedback.
-
@jraby your solution helped me to resolve the issue, however I ended up using the Caution - this may not be what you want
I'm using this
This is on Linux Mint 20.2 |
Beta Was this translation helpful? Give feedback.
-
the same problem exists on rke2 (no surprise, given its roots are in k3s) |
Beta Was this translation helpful? Give feedback.
-
Yes, this is by design. Stopping the K3s (or RKE2) service does not stop running containers. This is to allow for nondisruptive upgrades of the main K3s/RKE2 components by simply replacing the binary and restarting the service. |
Beta Was this translation helpful? Give feedback.
-
would you accept a feature request to add a systemd unit like #2400 (comment) which only triggers on shutdown? here's my non-instanced version of that (for rke2):
|
Beta Was this translation helpful? Give feedback.
-
That might be a good thing to add to the documentation, for folks that want it? |
Beta Was this translation helpful? Give feedback.
-
Confirming this behaviour to be present with:
|
Beta Was this translation helpful? Give feedback.
-
@ciacon this is not version-specific behavior. As described at #2400 (comment) by design, pods are not stopped when the k3s process exits. |
Beta Was this translation helpful? Give feedback.
-
in releases prior to 1.23.7 it was enough to add KillMode=mixed to /etc/systemd/system/k3s.service , and when system shutdown executed, k3s killed containers and computer was turned off imediately For some reason unknown to me since 1.23.8 ---> up to current one 1.24.4 when doing so, it takes 90s again to shutdown system with k3s (... which is default systemctl timeout TimeoutStopUSec=1min 30s ... ), so KillMode mixed is ignored and k3s waits until timeout has passed to kill them .... what has changed?
|
Beta Was this translation helpful? Give feedback.
-
Probably something related to the containerd version change? I'm not sure, since changing the KillMode isn't something we test or support. I would recommend adding another unit that runs on shutdown, as described above. |
Beta Was this translation helpful? Give feedback.
-
Thanks I have implemented shutdown unit as described by @horihel several days ago and so far it works great. May I vote for adding this to official documentation @brandond ? I believe it is pretty common scenario, since k3s is ideal for edge deployments, and usually edge devices get much more shutdowns then servers usually do. |
Beta Was this translation helpful? Give feedback.
-
Here is a k3s version of #2400 (comment):
Put the file to
Also note that this service name |
Beta Was this translation helpful? Give feedback.
-
This will only work with unified cgroups though as for example I don't have
Ubuntu 22.04 |
Beta Was this translation helpful? Give feedback.
-
I had trouble getting a shutdown service to behave, but it turns out that was because I changed the |
Beta Was this translation helpful? Give feedback.
-
Yes, good catch. You will need to adapt the example for agent nodes. The server and agent use different service names. |
Beta Was this translation helpful? Give feedback.
-
Unless of course one uses k3s ansible role which names them both as k3s.service. :) |
Beta Was this translation helpful? Give feedback.
-
Can confirm this also works if you get the message Make sure to drain the node before shutdown, otherwise there will be data loss. If you use the k3s ansible role you need to extract k3s-killall.sh from Lines 666 to 743 in d9f40d4 |
Beta Was this translation helpful? Give feedback.
-
Converting this issue into a discussion as this behavior is by design. |
Beta Was this translation helpful? Give feedback.
-
Environmental Info:
K3s Version: k3s version v1.18.6+k3s1 (6f56fa1)
Node(s) CPU architecture, OS, and Version: x86_64 Ubuntu 20.04.1
Linux nuc-linux3 5.4.0-48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: 1 master 2 workers
Describe the bug:
When shutting down or rebooting the node, the shutdown hangs for approximately 90 seconds. The console message is
When researching the problem I landed on this issue: ddev/ddev#2538 (comment) where they said when they uninstalled k3s the problem went away. I disabled and stopped
k3s-agent.service
and rebooted and the problem also went away for me.I also tried re-enabling and starting
k3s-agent.service
and removing thedocker.io
package and runningapt autoremove
to removecontainerd
,runc
, etc. but it still hangs on reboot at the same place.Beta Was this translation helpful? Give feedback.
All reactions