-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When the systemd is busy, runc init will hang and cannot exit, eventually ending up in D state. #3904
Comments
OTOH it makes sense to pass a context with timeout to Can you check if something like this fixes your issue? The patch is against runc main branch. diff --git a/libcontainer/cgroups/systemd/common.go b/libcontainer/cgroups/systemd/common.go
index c577a22b..c970e941 100644
--- a/libcontainer/cgroups/systemd/common.go
+++ b/libcontainer/cgroups/systemd/common.go
@@ -130,7 +130,9 @@ func startUnit(cm *dbusConnManager, unitName string, properties []systemdDbus.Pr
retry:
err := cm.retryOnDisconnect(func(c *systemdDbus.Conn) error {
- _, err := c.StartTransientUnitContext(context.TODO(), unitName, "replace", properties, statusChan)
+ ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
+ defer cancel()
+ _, err := c.StartTransientUnitContext(ctx, unitName, "replace", properties, statusChan)
return err
})
if err != nil { |
@kolyshkin libcontainer/cgroups/systemd/v2.go
libcontainer/init_linux.go
In the signalAllProcesses method, it's unclear why there is a need for freeze and thaw operations. In most scenarios, the invocations of m.GetAllPids() and p.Signal(s) don't require freeze and thaw. So why does this particular method demand a freeze operation? |
@113xiaoji I'm sorry, we were discussing the runc hang caused by non-responding systemd in this issue, and now it's about freezing. Maybe you added a comment to the wrong issue, and in fact you mean #3803? Please clarify As for using signalAllProcesses, its logic (and the logic of its users) was changed in #3825. |
@kolyshkin Thank you for your response. In this issue, we are focusing on the runc hang problem caused by systemd not responding. I believe that all APIs related to interacting with systemd, such as: startUnit, stopUnit, resetFailedUnit, getUnitTypeProperty, setUnitProperties, should have a timeout mechanism added. Otherwise, when systemd is busy, we can't predict exactly at which step it will hang. In our scenario, a large number of Pods are being deployed, and since systemd needs to listen for changes in /proc/self/mountinfo and update synchronously, when there are a large number of mount points, it causes systemd to become extremely busy. Related issue are #2532 |
I will discuss the issue of runc init becoming an orphan process in #3803 and #3825. |
I wouldn't mind adding timeouts, but it should be noted that this will not result in containers starting without issue -- it will result in containers failing to start in cases where we need to talk to systemd to start our containers. In the case of Kubernetes I would expect Kubernetes to try to start the container again which would probably not help with the system load issue. It should be noted that the original issue that caused the systemd busy-ness problem has been fixed (#2532). |
Description
In the scenario where a single node concurrently deploys more than 100 containers, the call chain is kubelet -> containerd -> containerd-shim-runc-v2 ->runc root create -> runc init -> dbus -> systemd -> cgroup. However, systemd is single-threaded and when it's busy, it continuously occupies one core. This causes the runc main process to be unable to get a response in a timely manner, with no timeout mechanism, leading to a deadlock. The related flame graph is as follows:
As shown, a main goroutine is blocked waiting for a message from a channel, while another goroutine, ReadMsgUnix, never receives a response, causing the main goroutine to block indefinitely.
The primary issue lies in the fact that runc init, as a client, doesn't incorporate a timeout mechanism. A potential solution would be to introduce a timeout, enabling runc to actively disconnect in the event of abnormal scenarios, such as systemd being overly busy. We could modify the context for the startUnit function in libcontainer/cgroups/systemd/common.go to include a timeout. Currently, the context for startUnit is context.TODO(), which has no timeout. We could change this to contextWithTimeout, _ := context.WithTimeout(context.Background(), 30*time.Second), ensuring that runc would not wait indefinitely in the face of unresponsive components.
Steps to reproduce the issue
Describe the results you received and expected
What version of runc are you using?
1.1.2
Host OS information
x86
Host kernel information
Linux master1 5.10.0-60.18.0.50.h665.eulerosv2r11.x86_64 #1 SMP Fri Dec 23 16:12:27 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: