Skip to content

Conversation

@anmaxvl
Copy link
Contributor

@anmaxvl anmaxvl commented Jun 28, 2021

When running a script from an azure file share interrupted syscall
occurs quite frequently, to remedy this add retries around execve
syscall, when EINTR is returned.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1917662

1.0 backport: #3074

@anmaxvl anmaxvl force-pushed the retry-unix-eintr branch from 8e4cf81 to 419ae14 Compare June 28, 2021 01:48
AkihiroSuda
AkihiroSuda previously approved these changes Jun 28, 2021
cyphar
cyphar previously requested changes Jun 28, 2021
Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the lint warnings (most notably you aren't actually using maxEINTRRetryCount in your loop -- also please rename this to maxExecvRetries).

@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 28, 2021

Please fix the lint warnings (most notably you aren't actually using maxEINTRRetryCount in your loop -- also please rename this to maxExecvRetries).

oh, bummer. will do.

@anmaxvl anmaxvl force-pushed the retry-unix-eintr branch 2 times, most recently from c01fc83 to 3ab53d2 Compare June 28, 2021 05:45
@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 28, 2021

@cyphar, done. please, take another look.

@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 28, 2021

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By definition, EINTR is a temporary error, so no "max retry counter" is needed.

I would do something like this instead:

diff --git a/libcontainer/standard_init_linux.go b/libcontainer/standard_init_linux.go
index ce9117fa..7cfcc8ca 100644
--- a/libcontainer/standard_init_linux.go
+++ b/libcontainer/standard_init_linux.go
@@ -226,7 +226,7 @@ func (l *linuxStandardInit) Init() error {
                return err
        }
 
-       if err := unix.Exec(name, l.config.Args[0:], os.Environ()); err != nil {
+       if err := system.Exec(name, l.config.Args[0:], os.Environ()); err != nil {
                return newSystemErrorWithCause(err, "exec user process")
        }
        return nil
diff --git a/libcontainer/system/linux.go b/libcontainer/system/linux.go
index 4379a207..fa1b442c 100644
--- a/libcontainer/system/linux.go
+++ b/libcontainer/system/linux.go
@@ -35,7 +35,16 @@ func Execv(cmd string, args []string, env []string) error {
                return err
        }
 
-       return unix.Exec(name, args, env)
+       return Exec(name, args, env)
+}
+
+func Exec(cmd string, args []string, env []string) error {
+       for {
+               err := unix.Exec(cmd, args, env)
+               if err != unix.EINTR { //nolint:errorlint // unix errors are bare
+                       return err
+               }
+       }
 }
 
 func Prlimit(pid, resource int, limit unix.Rlimit) error {

@kolyshkin kolyshkin added the backport/1.0-todo A PR in main branch which needs to be backported to release-1.0 label Jun 28, 2021
@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 28, 2021

By definition, EINTR is a temporary error, so no "max retry counter" is needed.

I would do something like this instead:

diff --git a/libcontainer/standard_init_linux.go b/libcontainer/standard_init_linux.go
index ce9117fa..7cfcc8ca 100644
--- a/libcontainer/standard_init_linux.go
+++ b/libcontainer/standard_init_linux.go
@@ -226,7 +226,7 @@ func (l *linuxStandardInit) Init() error {
                return err
        }
 
-       if err := unix.Exec(name, l.config.Args[0:], os.Environ()); err != nil {
+       if err := system.Exec(name, l.config.Args[0:], os.Environ()); err != nil {
                return newSystemErrorWithCause(err, "exec user process")
        }
        return nil
diff --git a/libcontainer/system/linux.go b/libcontainer/system/linux.go
index 4379a207..fa1b442c 100644
--- a/libcontainer/system/linux.go
+++ b/libcontainer/system/linux.go
@@ -35,7 +35,16 @@ func Execv(cmd string, args []string, env []string) error {
                return err
        }
 
-       return unix.Exec(name, args, env)
+       return Exec(name, args, env)
+}
+
+func Exec(cmd string, args []string, env []string) error {
+       for {
+               err := unix.Exec(cmd, args, env)
+               if err != unix.EINTR { //nolint:errorlint // unix errors are bare
+                       return err
+               }
+       }
 }
 
 func Prlimit(pid, resource int, limit unix.Rlimit) error {

thanks, sounds good to me. the reason I added the max retry counter is that during testing, sometimes we'd end up in an infinite loop, where syscall never succeeded.

@anmaxvl anmaxvl force-pushed the retry-unix-eintr branch from 3ab53d2 to 9087833 Compare June 28, 2021 21:28
@kolyshkin
Copy link
Contributor

the reason I added the max retry counter is that during testing, sometimes we'd end up in an infinite loop, where syscall never succeeded.

This is interesting; can you provide more details?

@anmaxvl anmaxvl force-pushed the retry-unix-eintr branch from 9087833 to e7b9a28 Compare June 28, 2021 21:45
@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 28, 2021

the reason I added the max retry counter is that during testing, sometimes we'd end up in an infinite loop, where syscall never succeeded.

This is interesting; can you provide more details?

nothing special here, I replaced the runc binary inside a VM, started a container, inside that container I mounted a cifs share (e.g. at /mnt/share) and then did runc exec <id> /mnt/share/script.sh. This would hang, until the process was killed. Not sure if this was due to some network lag or whatnot, but I wasn't able to repro this afterwards.

@kolyshkin
Copy link
Contributor

nothing special here, I replaced the runc binary inside a VM, started a container, inside that container I mounted a cifs share (e.g. at /mnt/share) and then did runc exec /mnt/share/script.sh. This would hang, until the process was killed. Not sure if this was due to some network lag or whatnot, but I wasn't able to repro this afterwards.

It was possibly a process stuck in a syscall. Usually the kernel prints a backtrace if a process is stuck for more than 2 minutes. If you will be able to repro, I suggest waiting 2 minutes and checking the kernel log (dmesg), also checking /proc/$PID/wchan. If it's not a stuck syscall, strace` should show what is going on.

In any case, I don't think this is EINTR.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

dcantah
dcantah previously approved these changes Jun 28, 2021
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the loop

kolyshkin
kolyshkin previously approved these changes Jun 29, 2021
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin kolyshkin requested a review from cyphar June 29, 2021 02:32
@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 29, 2021

nothing special here, I replaced the runc binary inside a VM, started a container, inside that container I mounted a cifs share (e.g. at /mnt/share) and then did runc exec /mnt/share/script.sh. This would hang, until the process was killed. Not sure if this was due to some network lag or whatnot, but I wasn't able to repro this afterwards.

It was possibly a process stuck in a syscall. Usually the kernel prints a backtrace if a process is stuck for more than 2 minutes. If you will be able to repro, I suggest waiting 2 minutes and checking the kernel log (dmesg), also checking /proc/$PID/wchan. If it's not a stuck syscall, strace` should show what is going on.

In any case, I don't think this is EINTR.

Thanks for suggestions, will try this next time.

@kolyshkin kolyshkin requested a review from AkihiroSuda June 29, 2021 19:12
@kolyshkin
Copy link
Contributor

I did some background research. It looks like neither golang nor glibc wrap execve() into retry-on-eintr loop.

Although it's totally fine to wrap a call in a retry-on-eintr loop (as if it never happens it's a no-op), I have to ask @anmaxvl -- did you really see EINTR from unix.Execve? Can you reproduce it and share the logs here?

@kolyshkin
Copy link
Contributor

I did a quick research, and it looks like

  • neither glibc nor go stdlib wrap execve(2) call into retry-on-eintr loop;
  • Linux execve(2) man page do not mention EINTR;
  • Linux signal(7) man page (which describes EINTR in more details) do not mention execve(2)

The last two are sort of expected as EINTR is poorly documented in general.

Now, it is perfectly fine to wrap a syscall into a retry-on-eintr loop. Yet, I will appreciate if you @anmaxvl can reproduce it and show us the logs (or show the logs that you saw earlier), to make sure it is definitely EINTR returned from execve(2).

I believe the error (for runc exec) should be something like standard_init_linux.go:230: exec user process caused: interrupted system call. Is this what you see?

@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 30, 2021

I believe the error (for runc exec) should be something like standard_init_linux.go:230: exec user process caused: interrupted system call. Is this what you see?

Yes, that's what I was seeing. Here's an old exec command that I had saved. I just logged an error to stdout when the error was returned. In this case it was only 1 retry, but I've observed cases with multiple retries as well.

/ # runc --debug exec aff3fe4002189a710944dac93dbedb6382daaf4361fdcf451f09b67777
a02df8 /mnt/share/run.sh
DEBU[0000]github.com/opencontainers/runc/libcontainer/cgroups/file.go:89 github.com/opencontainers/runc/libcontainer/cgroups.prepareOpenat2.func1() openat2 not available, falling back to securejoin
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec[886]: => nsexec container setup
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec[886]: set process as non-dumpable
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: ~> nsexec stage-0
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: spawn stage-1
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: -> stage-1 synchronisation loop
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-1[890]: ~> nsexec stage-1
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-1[890]: setns(0x8000000) into ipc namespace (with path /proc/148/ns/ipc)
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-1[890]: setns(0x4000000) into uts namespace (with path /proc/148/ns/uts)
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-1[890]: setns(0x40000000) into net namespace (with path /proc/148/ns/net)
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-1[890]: setns(0x20000000) into pid namespace (with path /proc/148/ns/pid)
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-1[890]: setns(0x20000) into mnt namespace (with path /proc/148/ns/mnt)
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-1[890]: unshare remaining namespace (except cgroupns)
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-1[890]: spawn stage-2
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-1[890]: request stage-0 to forward stage-2 pid (891)
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-2[591]: ~> nsexec stage-2
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: stage-1 requested pid to be forwarded
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: forward stage-1 (890) and stage-2 (891) pids to runc
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-1[890]: signal completion to stage-0
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-1[890]: <~ nsexec stage-1
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: stage-1 complete
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: <- stage-1 synchronisation loop
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: -> stage-2 synchronisation loop
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: signalling stage-2 to run
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-2[591]: signal completion to stage-0
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: stage-2 complete
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: <- stage-2 synchronisation loop
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-0[886]: <~ nsexec stage-0
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-2[591]: <= nsexec container setup
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() nsexec-2[591]: booting up go runtime ...
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() child process in init()
DEBU[0000]github.com/opencontainers/runc/libcontainer/logs/logs.go:69 github.com/opencontainers/runc/libcontainer/logs.processEntry() setns_init: about to exec
system.Exec error: interrupted system call
system.Exec error: interrupted system call
DEBU[0000]github.com/opencontainers/runc/signals.go:104 main.(*signalHandler).forward() sending signal to process urgent I/O condition
Hello, world
DEBU[0000]github.com/opencontainers/runc/signals.go:94 main.(*signalHandler).forward() process exited

@kolyshkin
Copy link
Contributor

@anmaxvl can you please also share the patch you made to show the error?

When running a script from an azure file share interrupted syscall
occurs quite frequently, to remedy this add retries around execve
syscall, when EINTR is returned.

Signed-off-by: Maksim An <maksiman@microsoft.com>
@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jul 1, 2021

@anmaxvl can you please also share the patch you made to show the error?

here's the patch anmaxvl@e07095d, which was applied on top of this PR

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin kolyshkin merged commit ccf6a94 into opencontainers:master Jul 8, 2021
@anmaxvl anmaxvl deleted the retry-unix-eintr branch August 2, 2021 07:16
@kolyshkin kolyshkin added backport/1.0-done A PR in main branch which has been backported to release-1.0 and removed backport/1.0-todo A PR in main branch which needs to be backported to release-1.0 labels Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/1.0-done A PR in main branch which has been backported to release-1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants