Skip to content

SSH Forwarding Doesn't Release File Descriptors #1146

Closed

Description

Environment:

macOS Mojave 10.14.6

Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:18:17 2019
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:17:52 2019
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

It appears the SSH forwarding system fails to release file descriptors as SSH agent connections are created and dropped.

Running a command, such as docker buildx build --ssh=default=$SSH_AUTH_SOCK --progress=plain --tag=outdoorsy/scotty . to build a Dockerfile, in which the command RUN --mount=type=ssh go mod download is run, results in the SSH agent on the host machine locking up, as file descriptors are never released.

There appears to be two issues.

Firstly, we appear to be leaking connections on this line https://github.com/moby/buildkit/blob/master/session/sshforward/copy.go#L20. This should probably be changed to a defer conn.Close().

Secondly, https://github.com/moby/buildkit/blob/master/session/sshforward/copy.go#L42 this line will hang forever, even after the ssh client has terminated. I can't figure out why. I would have assumed that, as the SSH client terminated, it would close its connection, resulting in an EOF on this read, which would then return from the function. In combination with the conn.Close change, this would result in everything shutting down nicely.

To replicate this issue, you need to run a command that will generate a huge number of git requests in a single RUN.

To see the issue with the conn.Read never returning EOF, you can run

RUN --mount=type=ssh mkdir -p -m 0600 ~/.ssh && \
  ssh-keyscan github.com >> ~/.ssh/known_hosts && \
  ssh -T git@github.com ; sleep 10 ; exit 0

This spins up an SSH client, which will use the SSH_AUTH_SOCK, calls exit 0 so the build won't die, then sleeps for 10 seconds to give the FD a chance to die. It never does. I put in a bunch of fmt.Println calls in the Copy routine to see what it was doing. It will not exit until the RUN is done, at which point the first loop receives a context cancelled error and returns.

So, to summarize, it appears the Copy routine does not return until the RUN command finishes (triggering a context cancelled error). Because of this, the SSH FDs are held open, and, with an operation that uses SSH a lot, causes the host ssh-agent to lock up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions