-
Notifications
You must be signed in to change notification settings - Fork 2.2k
retry unix.EINTR for container init process #3045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8e4cf81 to
419ae14
Compare
cyphar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the lint warnings (most notably you aren't actually using maxEINTRRetryCount in your loop -- also please rename this to maxExecvRetries).
oh, bummer. will do. |
c01fc83 to
3ab53d2
Compare
|
@cyphar, done. please, take another look. |
|
@msscotb, @kevpar, @katiewasnothere, @dcantah, @ambarve FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By definition, EINTR is a temporary error, so no "max retry counter" is needed.
I would do something like this instead:
diff --git a/libcontainer/standard_init_linux.go b/libcontainer/standard_init_linux.go
index ce9117fa..7cfcc8ca 100644
--- a/libcontainer/standard_init_linux.go
+++ b/libcontainer/standard_init_linux.go
@@ -226,7 +226,7 @@ func (l *linuxStandardInit) Init() error {
return err
}
- if err := unix.Exec(name, l.config.Args[0:], os.Environ()); err != nil {
+ if err := system.Exec(name, l.config.Args[0:], os.Environ()); err != nil {
return newSystemErrorWithCause(err, "exec user process")
}
return nil
diff --git a/libcontainer/system/linux.go b/libcontainer/system/linux.go
index 4379a207..fa1b442c 100644
--- a/libcontainer/system/linux.go
+++ b/libcontainer/system/linux.go
@@ -35,7 +35,16 @@ func Execv(cmd string, args []string, env []string) error {
return err
}
- return unix.Exec(name, args, env)
+ return Exec(name, args, env)
+}
+
+func Exec(cmd string, args []string, env []string) error {
+ for {
+ err := unix.Exec(cmd, args, env)
+ if err != unix.EINTR { //nolint:errorlint // unix errors are bare
+ return err
+ }
+ }
}
func Prlimit(pid, resource int, limit unix.Rlimit) error {
thanks, sounds good to me. the reason I added the max retry counter is that during testing, sometimes we'd end up in an infinite loop, where syscall never succeeded. |
3ab53d2 to
9087833
Compare
This is interesting; can you provide more details? |
9087833 to
e7b9a28
Compare
nothing special here, I replaced the runc binary inside a VM, started a container, inside that container I mounted a cifs share (e.g. at |
It was possibly a process stuck in a syscall. Usually the kernel prints a backtrace if a process is stuck for more than 2 minutes. If you will be able to repro, I suggest waiting 2 minutes and checking the kernel log ( In any case, I don't think this is EINTR. |
kolyshkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
kolyshkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the loop
e7b9a28 to
b2629a5
Compare
kolyshkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for suggestions, will try this next time. |
|
I did some background research. It looks like neither golang nor glibc wrap Although it's totally fine to wrap a call in a retry-on-eintr loop (as if it never happens it's a no-op), I have to ask @anmaxvl -- did you really see |
|
I did a quick research, and it looks like
The last two are sort of expected as EINTR is poorly documented in general. Now, it is perfectly fine to wrap a syscall into a retry-on-eintr loop. Yet, I will appreciate if you @anmaxvl can reproduce it and show us the logs (or show the logs that you saw earlier), to make sure it is definitely EINTR returned from execve(2). I believe the error (for |
Yes, that's what I was seeing. Here's an old exec command that I had saved. I just logged an error to stdout when the error was returned. In this case it was only 1 retry, but I've observed cases with multiple retries as well. |
|
@anmaxvl can you please also share the patch you made to show the error? |
When running a script from an azure file share interrupted syscall occurs quite frequently, to remedy this add retries around execve syscall, when EINTR is returned. Signed-off-by: Maksim An <maksiman@microsoft.com>
here's the patch anmaxvl@e07095d, which was applied on top of this PR |
kolyshkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When running a script from an azure file share interrupted syscall
occurs quite frequently, to remedy this add retries around execve
syscall, when EINTR is returned.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1917662
1.0 backport: #3074