Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc init hanging on openat() #3448

Closed
alam0rt opened this issue Apr 4, 2022 · 5 comments
Closed

runc init hanging on openat() #3448

alam0rt opened this issue Apr 4, 2022 · 5 comments

Comments

@alam0rt
Copy link

alam0rt commented Apr 4, 2022

Hi, we recently experienced an issue where many of our nodes were failing to create pods (CreateContainerConfigError).

We noticed that containerd had spawned many runc init processes which I gather is normal, except they never got to execve and instead were hanging on openat() on the execFifo pipe.

# lsof -p 129677 -w
COMMAND      PID   USER   FD      TYPE DEVICE SIZE/OFF     NODE NAME
runc:[2:I 129677 nobody  cwd       DIR 0,2119     4096 12563672 /app
runc:[2:I 129677 nobody  rtd       DIR 0,2119     4096  8206649 /
runc:[2:I 129677 nobody  txt       REG  259,1 11049264     9946 /
runc:[2:I 129677 nobody  mem       REG  259,1  2030928     2237 /lib/x86_64-linux-gnu/libc-2.27.so
runc:[2:I 129677 nobody  mem       REG  259,1   129312     2209 /lib/x86_64-linux-gnu/libseccomp.so.2.5.1
runc:[2:I 129677 nobody  mem       REG  259,1   144976     2263 /lib/x86_64-linux-gnu/libpthread-2.27.so
runc:[2:I 129677 nobody  mem       REG  259,1   179152     2233 /lib/x86_64-linux-gnu/ld-2.27.so
runc:[2:I 129677 nobody    0u      CHR    1,3      0t0        7 /dev/null
runc:[2:I 129677 nobody    1w     FIFO   0,13      0t0  2416908 pipe
runc:[2:I 129677 nobody    2w     FIFO   0,13      0t0  2416909 pipe
runc:[2:I 129677 nobody    5u     FIFO   0,25      0t0     4889 /run/containerd/runc/k8s.io/d0136625f29d1ab1b14c4180fe69816c90d2641100caaa19a1d515d05a78f408/exec.fifo
runc:[2:I 129677 nobody    7u  a_inode   0,14        0    10761 [eventpoll]
runc:[2:I 129677 nobody    8r     FIFO   0,13      0t0  2412940 pipe
runc:[2:I 129677 nobody    9w     FIFO   0,13      0t0  2412940 pipe

# strace -p 129377
strace: Process 129677 attached
openat(AT_FDCWD, "/proc/self/fd/5", O_WRONLY|O_CLOEXEC

Logs are full of below, but no smoking guns.

Apr 03 00:44:52 $host containerd[77726]: {"error":"failed to set removing state for container \"e312392e9d198e3db585bfe80473765291d471f86017cb04c4a24c6475250852\": container is in starting state, can't be removed","level":"error","msg":"RemoveContainer for \"e312392e9d198e3db585bfe80473765291d471f86017cb04c4a24c6475250852\" failed","time":"2022-04-03T00:44:52.821862269Z"}

The issue seems very similar to #2828 minus that we are on 1.0.3. Also I am able to re strace the runc process and it doesn't cause it to exit after detaching.

uname: 5.4.0-1071-aws #76~18.04.1-Ubuntu

Any help would be greatly appreciated.

@kolyshkin
Copy link
Contributor

Container start can be split into two phases -- runc create and runc start.

  • runc create creates a container and starts a bare runc init process in it. This runc init when waits for exec fifo to be opened on the other side, as a mechanism of synchronization. Once opened, it writes a 0 byte in it and proceeds to exectute the real container init process.
  • runc start actually starts that container (by opening the exec fifo and reading the data from it), basically signalling the runc init that it should proceed.

So, in between runc create and runc start we have a runc init which waits on exec fifo. This is normal, this is how things are designed to work. The presence of runc init waiting on exec fifo means that someone (most probably containerd, in your case) has called runc create but hasn't called runc start yet.

@kolyshkin
Copy link
Contributor

To add to the previous comment -- container in a starting state (as described by the previous comment) can be deleted by calling runc delete. It seems that something in containerd logic prevents that.

@kolyshkin
Copy link
Contributor

To sum it up, it seems that runc works as expected, I'd ask containerd folks about this.

@alam0rt
Copy link
Author

alam0rt commented Apr 4, 2022

Ah, noted. Thanks for the quick reply. I'll raise this with containerd.

@hemangjoshi37a
Copy link

Has this issue been solved or not because I am having this issue with my PC and it is quite annoying to hard reset the pc
after every couple of hours. Please if anyone has any solutoin let me know. Thanks...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants