Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions cli/command/cli.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ import (
notaryclient "github.com/theupdateframework/notary/client"
)

const defaultInitTimeout = 2 * time.Second

// Streams is an interface which exposes the standard input and output streams
type Streams interface {
In() *streams.In
Expand Down Expand Up @@ -77,6 +79,7 @@ type DockerCli struct {
currentContext string
dockerEndpoint docker.Endpoint
contextStoreConfig store.Config
initTimeout time.Duration
}

// DefaultVersion returns api.defaultVersion.
Expand Down Expand Up @@ -313,13 +316,21 @@ func resolveDefaultDockerEndpoint(opts *cliflags.CommonOptions) (docker.Endpoint
}, nil
}

func (cli *DockerCli) getInitTimeout() time.Duration {
if cli.initTimeout != 0 {
return cli.initTimeout
}
return defaultInitTimeout
}

func (cli *DockerCli) initializeFromClient() {
ctx := context.Background()
if strings.HasPrefix(cli.DockerEndpoint().Host, "tcp://") {
host := cli.DockerEndpoint().Host
if !strings.HasPrefix(host, "ssh://") {
// @FIXME context.WithTimeout doesn't work with connhelper / ssh connections
// time="2020-04-10T10:16:26Z" level=warning msg="commandConn.CloseWrite: commandconn: failed to wait: signal: killed"
var cancel func()
ctx, cancel = context.WithTimeout(ctx, 2*time.Second)
ctx, cancel = context.WithTimeout(ctx, cli.getInitTimeout())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminds me that I still wanted to look into this TODO as well;

cli/cli/command/cli.go

Lines 135 to 138 in d0df532

func (cli *DockerCli) ServerInfo() ServerInfo {
// TODO(thaJeztah) make ServerInfo() lazily load the info (ping only when needed)
return cli.serverInfo
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious though; #3652 describes that the CLI hangs indefinitely; but there's already a 2 second timeout. I understand that this PR makes that timeout configurable, but wouldn't #3652 already timeout after 2 seconds? Or is the unix:// addition the actual fix here?

Copy link
Contributor Author

@nicks nicks Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the unix:// addition is the fix. The configurability wasn't in the original pr, it's something @vvoland asked for in review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for those details (I probably should've read back the whole discussion above)

So some thoughts;

First of all, "the fix"; I think this was an incorrect assumption on my side. I added this check in b397391 / #2424, wich the assumption that a local socket would not really need a timeout (unless something was horribly wrong), and of course "horribly wrong" now appears to not be theoretical (per your ticket) 😂

I assume the same could also happen on a named pipe (npipe://) on Windows or fd:// (default on Linux for systemd created socket)? If so, we could be explicit;

if !strings.HasPrefix(cli.DockerEndpoint().Host, "ssh://") {

We should verify though if cli.DockerEndpoint().Host is guaranteed to always have a scheme (I know there's some "fuzzy" / "relaxed" handling of some values in our code-base, and we should verify that the values is at least normalized (scheme added if missing) early on). At a quick glance, I think that's the case though; calls getServerHost(), which does normalising (lol, we need to get rid of some abstraction here and there);

host, err := getServerHost(opts.Hosts, opts.TLSOptions)

As to making the timeout configurable; in general, I'm not against adding configurable options; that said, once added, it's pretty hard to (deprecate and) remove an option, so I also try to be conservative where possible. I realize the 2 seconds was just some arbitrary value ("2 seconds should be enough?"); for your use-case, are you planning on adjusting that default, or would the "fix" be enough (only needed to prevent a full hang, but default is "fine" otherwise)?

What I'm thinking is, that we could; split the fix from the feature; the fix (one line change, and appears to be low risk) could be candidate for backporting to 20.10 if we think that's good. For the feature (make the timeout configurable), we could keep that in draft (for now), and still decide to take it if it turns out there's good use-cases for overriding. Ultimately (also see my other comment above), I think it would be great if we didn't bother with pinging at all, unless we need it (in most cases that would be "the moment we need to connect to the API anyway").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I backed out the WithInitializationTimeout setting. I also changed it to check "not ssh://".

ya, i'm pretty confident that this codepath always normalizes the Host to have a scheme:

func parseDockerDaemonHost(addr string) (string, error) {

speaking to some of the more high-level comments:

I still don't know how the user's Docker engine got into the state where it was hanging on the ping, but I think it's a good engineering value to be paranoid.

I really like the general Go convention of splitting "Connection" from "Client", where the Connection handles pings/healthchecks and the Client handles API types. grpc or kubernetes/client-go both separate them out this way. i think the Ping() here is a symptom of that more high-level problem of the client trying to do both in a single object. but appreciate that's a larger change.

defer cancel()
}

Expand Down
53 changes: 53 additions & 0 deletions cli/command/cli_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,15 @@ import (
"context"
"fmt"
"io"
"net"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"runtime"
"strings"
"testing"
"time"

"github.com/docker/cli/cli/config"
"github.com/docker/cli/cli/config/configfile"
Expand Down Expand Up @@ -165,6 +168,56 @@ func TestInitializeFromClient(t *testing.T) {
}
}

// Makes sure we don't hang forever on the initial connection.
// https://github.com/docker/cli/issues/3652
func TestInitializeFromClientHangs(t *testing.T) {
dir := t.TempDir()
socket := filepath.Join(dir, "my.sock")
l, err := net.Listen("unix", socket)
assert.NilError(t, err)

receiveReqCh := make(chan bool)
timeoutCtx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()

// Simulate a server that hangs on connections.
ts := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
select {
case <-timeoutCtx.Done():
case receiveReqCh <- true: // Blocks until someone receives on the channel.
}
_, _ = w.Write([]byte("OK"))
}))
ts.Listener = l
ts.Start()
defer ts.Close()

opts := &flags.CommonOptions{Hosts: []string{fmt.Sprintf("unix://%s", socket)}}
configFile := &configfile.ConfigFile{}
apiClient, err := NewAPIClientFromFlags(opts, configFile)
assert.NilError(t, err)

initializedCh := make(chan bool)

go func() {
cli := &DockerCli{client: apiClient, initTimeout: time.Millisecond}
cli.Initialize(flags.NewClientOptions())
close(initializedCh)
}()

select {
case <-timeoutCtx.Done():
t.Fatal("timeout waiting for initialization to complete")
case <-initializedCh:
}

select {
case <-timeoutCtx.Done():
t.Fatal("server never received an init request")
case <-receiveReqCh:
}
}

// The CLI no longer disables/hides experimental CLI features, however, we need
// to verify that existing configuration files do not break
func TestExperimentalCLI(t *testing.T) {
Expand Down