Skip to content

Conversation

djs55
Copy link
Collaborator

@djs55 djs55 commented May 6, 2021

By default Go sets O_CLOEXEC on file descriptors it creates. This ensures that they aren't accidentally inherited over fork+exec.

When binding a port on the host we first try net.Listen. On modern macOS this works for

  • unprivileged ports (> 1024)
  • privileged ports on INADDR_ANY (all IPs)

Unfortunately it doesn't work for privileged ports on specific IPs, such as 127.0.0.1:80. For these cases we contact a helper process which runs as root, which binds the port and then uses sendmsg to transmit it to us. Unfortunately when we recvmsg a file descriptor, the resulting descriptor does not have O_CLOEXEC set, so it is captured by future subprocesses. Even if the parent process calls close(), the port is still in use by the subprocesses and can't be re-bound. This caused the failure seen in docker/for-mac#5649

The solution is to ensure all file descriptors have O_CLOEXEC set. On Linux there's a sendmsg flag to set the flag atomically on received fds, but this is missing on Darwin. The best we can do is to call fcntl to set the flag ourselves after receiving the fd. Unfortunately this means there's a small race window where a fork would inherit the fd, but this shouldn't happen very often in practice.

Go will call fcntl for us if we "adopt" the file descriptor using os.NewFile and then use net.FileListener and net.FilePacketConn. This allows us to remove a lot of complicated wrapping code and allows us to remove a workaround for Close hanging when blocked in Accept, caused by the state of the file descriptor not being correct. This required some minor changes to types (net.Listener rather than *net.TCPListener) but it's worth it.

It's possible to work around this problem in other ways in the client code, for example:

  1. always speculatively closing file descriptors after fork but before exec. This can be expensive with high ulimit values, if there is no API to enumerate open fds. I'm not sure it can be done easily in Go where the exec.Command API is high-level.
  2. ban running long-lived subprocesses in your program. Restructure the code so that the long-lived subprocesses are managed by a separate process which doesn't bind ports.

Signed-off-by: David Scott dave@recoil.org

djs55 added 7 commits May 6, 2021 14:53
By default Go sets O_CLOEXEC on file descriptors it creates. This
ensures that they aren't accidentally inherited over fork+exec.

Unfortunately when we `recvmsg` a file descriptor, there is no
flag on Darwin to set the close on exec flag atomically. We must
do it ourselves and tolerate the small race window where the fd
could be inherited.

Signed-off-by: David Scott <dave@recoil.org>
This avoids having to create lots of wrapper structs.

Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
We use `Close()` directly everywhere.

Signed-off-by: David Scott <dave@recoil.org>
We now rely on Go to "adopt" the file descriptors via `net.FileListner`
and `net.FilePacketConn`. Go sets flags like `O_CLOEXEC` and takes care
of blocking modes.

Signed-off-by: David Scott <dave@recoil.org>
@djs55 djs55 force-pushed the close-on-exec branch from 7be54cb to ab3d74e Compare May 7, 2021 08:14
if controlErr != nil {
return
}
// Go normally opens file descriptors with O_CLOEXEC. This prevents races where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt syscall.Dup2 would set the O_CLOEXEC on the new fd. Do you have a reference for this comment?
Have you considered syscall.Dup3 ? Though I have no idea how it behaves on mac.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I should have squashed this commit -- the code is deleted in a later patch. I had a look on Mac for Dup3 but unfortunately they don't have it :(

@djs55 djs55 merged commit bbf54da into moby:master May 10, 2021
@djs55 djs55 deleted the close-on-exec branch May 10, 2021 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants