Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check context builder endpoint #1129

Merged
merged 1 commit into from
Jul 29, 2022
Merged

Conversation

crazy-max
Copy link
Member

@crazy-max crazy-max commented May 14, 2022

while making some tests with @tonistiigi about an issue with rm --all-inactive that doesn't effectively remove some inactive nodes we saw that there is an underlying issue with docker context being detected as a docker-container builder:

$ docker context create --docker host=ssh://mini remote-buildkit
remote-buildkit

$ docker context ls
NAME                TYPE                DESCRIPTION                               DOCKER ENDPOINT                               KUBERNETES ENDPOINT   ORCHESTRATOR
default *           moby                Current DOCKER_HOST based configuration   unix:///var/run/docker.sock                                         swarm
desktop-linux       moby                                                          npipe:////./pipe/dockerDesktopLinuxEngine
desktop-windows     moby                                                          npipe:////./pipe/dockerDesktopWindowsEngine
remote-buildkit     moby                                                          ssh://mini

$ docker buildx ls
NAME/NODE         DRIVER/ENDPOINT                        STATUS                 BUILDKIT PLATFORMS
builder           docker-container
  builder0        unix:///var/run/docker.sock            running                394ccf8  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
  mac-mini-m1     tcp://mac-mini-m1.home.foobar.com:2376 running                394ccf8  linux/arm64*, linux/amd64, linux/amd64/v2, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
  sifive          tcp://sifive.home.foobar.com:2375      running                394ccf8  linux/riscv64*
builder2 *        docker-container
  builder20       unix:///var/run/docker.sock            running                394ccf8  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
fervent_raman     docker-container
  fervent_raman0  unix:///var/run/docker.sock            running                v0.10.3  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
desktop-linux                                            protocol not available
desktop-windows                                          protocol not available
remote-buildkit   docker-container
  remote-buildkit remote-buildkit                        error during connect: Get "http://docker.example.com/v1.24/info": command [ssh -- mini docker system dial-stdio] has exited with exit status 255, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=ssh: Could not resolve hostname mini: Name or service not known
default   docker
  default default running 20.10.14 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6

here remote-buildkit should not be a docker-container builder. that's because we don't check this endpoint while initializing the docker client so it assumes the underlying node is a docker-container when returned from:

return dockerclient.NewClientWithOpts(clientOpts...)

this PR adds a basic Ping check when docker client is initialized so we can find out early that this builder is faulty:

$ docker buildx ls
NAME/NODE        DRIVER/ENDPOINT                        STATUS                 BUILDKIT PLATFORMS
builder          docker-container
  builder0       unix:///var/run/docker.sock            running                394ccf8  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
  mac-mini-m1    tcp://mac-mini-m1.home.foobar.com:2376 running                394ccf8  linux/arm64*, linux/amd64, linux/amd64/v2, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
  sifive         tcp://sifive.home.foobar.com:2375      running                394ccf8  linux/riscv64*
builder2 *       docker-container
  builder20      unix:///var/run/docker.sock            running                394ccf8  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
fervent_raman    docker-container
  fervent_raman0 unix:///var/run/docker.sock            running                v0.10.3  linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
desktop-linux                                           protocol not available
desktop-windows                                         protocol not available
remote-buildkit                                         error during connect: Get "http://docker.example.com/_ping": command [ssh -- mini docker system dial-stdio] has exited with exit status 255, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=ssh: Could not resolve hostname mini: Name or service not known
default          docker
  default        default                                running 20.10.14 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6

Signed-off-by: CrazyMax crazy-max@users.noreply.github.com

@crazy-max crazy-max requested a review from tonistiigi May 15, 2022 08:21
@crazy-max crazy-max force-pushed the fix-docker-context branch 2 times, most recently from 7f89e02 to 3b2a288 Compare May 20, 2022 05:22
@tonistiigi
Copy link
Member

I don't really understand this.

If the driver is wrong this does not depend on whether API can be pinged or not. It is either wrong or correct and if it is running doesn't have anything to do with it.

clientForEndpoint should not ping because that is not the contract. If ping is needed then it should be on the caller if they want to check if the endpoint is active at the current time.

The API struct should not store context. If the context is needed it should be passed by the caller who wants to initialize the client.

@crazy-max crazy-max changed the title erroneous driver returned for docker context check context builder endpoint May 24, 2022
@crazy-max
Copy link
Member Author

clientForEndpoint should not ping because that is not the contract. If ping is needed then it should be on the caller if they want to check if the endpoint is active at the current time.

Indeed it should be checked after client initialization if a context builder is being used. Pushed some changes. Let me know if it lgty.

@crazy-max crazy-max added this to the v0.9.0 milestone Jun 7, 2022
@tonistiigi
Copy link
Member

I still don't understand what does builder being inactive or not have to do with if it can be deleted or not.

@crazy-max crazy-max modified the milestone: v0.9.0 Jul 22, 2022
@crazy-max
Copy link
Member Author

crazy-max commented Jul 26, 2022

Something @rumpl found out and might be linked to this. If a docker context is created with an healthy endpoint but the GRPC connection is broken, when he runs docker buildx build, buildx will mistakenly implies this is a docker-container builder and bootstraps a container that is not referenced in buildx store and therefore creates an "orphan" container.

I still don't understand what does builder being inactive or not have to do with if it can be deleted or not.

Here is a repro. Create an unreachable docker context:

$ docker context create --docker host=ssh://notexist mycontext
mycontext
Successfully created context "mycontext"
$ docker buildx ls
NAME/NODE     DRIVER/ENDPOINT                        STATUS   BUILDKIT PLATFORMS
default       docker
  default     default                                running  20.10.17 linux/amd64, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/arm/v7, linux/arm/v6
mycontext     docker-container
  mycontext   mycontext                              error

Failed to get status for mycontext (mycontext): error during connect: Get "http://docker.example.com/v1.24/info": command [ssh -- notexist docker system dial-stdio] has exited with exit status 255, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=ssh: Could not resolve hostname notexist: Name or service not known

As you can see:

mycontext     docker-container
  mycontext   mycontext                              error

is wrong and I think should be

mycontext                                            error

@jedevc
Copy link
Collaborator

jedevc commented Jul 26, 2022

No reproducible test case here, but can confirm I've seen something like this before running Docker Desktop alongside the docker daemon on Linux.

Occasionally, for mysterious reasons, it would appear like the default builder had turned into a docker-container driver, which sounds almost identical to @crazy-max's case above.

@crazy-max crazy-max force-pushed the fix-docker-context branch 2 times, most recently from 1aed151 to f1ecb3b Compare July 28, 2022 20:26
@crazy-max
Copy link
Member Author

@tonistiigi As discussed, added a comment explaining the situation with fallback to docker-container driver.

commands/util.go Outdated Show resolved Hide resolved
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants