-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runc init blocked opening exec.fifo #2828
Comments
/cc @kk-src |
I think there was a report of a similar issue some time ago -- the main issue is that we can't add a timeout because it's reasonable for there to be an arbitrarily long split between |
Hmm, yes indeed. I wonder if we can open a pipe to wait for an exit signal, and part of cleaning up runc would be to ensure |
@cpuguy83 I am not sure what are you proposing. We have [kir@kir-rhat runc-tst]$ sudo ../runc/runc.my create xx345
[kir@kir-rhat runc-tst]$ sudo ../runc/runc.my list
ID PID STATUS BUNDLE CREATED OWNER
xx345 157215 created /home/kir/go/src/github.com/opencontainers/runc-tst 2021-03-04T00:39:41.707090433Z root
[kir@kir-rhat runc-tst]$ sudo strace -p 157215
strace: Process 157215 attached
openat(AT_FDCWD, "/proc/self/fd/5", O_WRONLY|O_CLOEXEC^Cstrace: Process 157732 detached
<detached ...>
[kir@kir-rhat runc-tst]$ sudo ../runc/runc.my delete xx345
[sudo] password for kir:
[kir@kir-rhat runc-tst]$ sudo strace -p 157215
strace: attach: ptrace(PTRACE_SEIZE, 157732): No such process Or do you mean that this
Frankly I do not like either way. A wrapper is an unnecessary complication for a very rare case. A cron script needs a definition of "stale"; let's say we define it as "started more than 1 hour ago and still there". In this case the init will still be there for at least an hour, and it still may break some workloads. Overall it looks like the problem needs to be solved at the upper level -- i.e. whoever calls |
If this is a problem of "runc delete" not being called indeed that is a simple fix. |
@cpuguy83 can we close it, or do you still see the bug? |
use docker 19.03.15 with containerd 1.4.4-3, runc init stuck. It happen on run 200+ containers and no memory press.
|
if strace into runc init strace
|
We saw a similar issue. We upp'd the /proc/sys/fs/pipe-user-pages-soft to 65535 and that seemed to solve it. We haven't figured out why, but we got the idea from these Chinese blogs (will require Google Translating the page) - https://www.igoodtv.com/p/1921069.html |
To resolve I'd recommend downgrading to rc92 if you can. |
I try rc92 it work. |
This might be fixed by #2871, which was just merged @dannylee- @wu0407 @cpuguy83 can you please test the runc tip and report back if the bug is fixed? |
@kolyshkin I don't think this issue is related. In the case fixed by #2871 runc was blocked writing to the pipe for libseccomp. In this issue something happens on the system (like OOM?) and runc is blocked on opening the init fifo b/c the main runc is not running. |
Right, sorry I mixed these up. As I said earlier, a subsequent |
We can close for now, I haven't received a reliable way to reproduce the issue. |
I'm not sure if it's the same issue, but you can reproduce this by doing the following:
If you run this on runc 1.0.0-rc93, the docker exec call will hang and there will be a hanging "runc init" process. Also, if the container has a healthcheck, it will start accumulating hanging "runc init" processes. If you run this on runc 1.0.0-rc92, everything works fine. |
@dannylee- Different issue, and that should be fixed on HEAD. Can you try with that? |
@dannylee- this is #2871 |
We have a case where
runc init
is blocked trying to openexec.fifo
(specifically/proc/self/fd/5
which points toexec.fifo
in the state dir.This happened on a machine with memory pressure and runc exiting with SIGSEGV.
I'm assuming this is happening here, but all I have to debug this is strace:
runc/libcontainer/standard_init_linux.go
Lines 188 to 191 in 4d4d19c
The text was updated successfully, but these errors were encountered: