Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error message discoverability for docker image push #3396

Open
alexsapps opened this issue Dec 23, 2021 · 2 comments
Open

Error message discoverability for docker image push #3396

alexsapps opened this issue Dec 23, 2021 · 2 comments

Comments

@alexsapps
Copy link

I ran docker image push ... and the output showed repeated attempts retrying to push four layers:

c8c6a5b96e52: Retrying in 9 seconds
9cf14d3426ca: Retrying in 10 seconds
cb8245e98621: Retrying in 10 seconds
9321ff862abb: Retrying in 10 seconds

I wish it would show an error message while retrying so I would not need to guess what the problem is. There was nothing in the docker daemon logs. If I recall correctly, even when it stops retrying, no specific error is printed.

Version:

Client:
Cloud integration: 1.0.14
Version: 20.10.6
API version: 1.41
Go version: go1.16.3
Git commit: 370c289
Built: Fri Apr 9 22:49:36 2021
OS/Arch: windows/amd64
Context: default
Experimental: true

@markliederbach
Copy link

markliederbach commented Jan 12, 2022

Similar situation - particularly with ECR, which returns 404s for non-existent repositories

$ docker image push 1234567890.dkr.ecr.us-west-2.amazonaws.com/foo:v3
The push refers to repository [1234567890.dkr.ecr.us-west-2.amazonaws.com/foo:v3]
4348ebd6ec2b: Retrying in 1 second
2a504e8356f5: Retrying in 1 second
710bdd2dc5d3: Retrying in 1 second
598e84f56c9f: Retrying in 1 second
43b64fe45445: Retrying in 1 second
d4a1b4262e12: Waiting
d7258d427362: Waiting
2913df593731: Waiting
32f366d666a5: Waiting
EOF

It would be nice to have a clearer error message from the CLI what was happening to the request upstream.

@thaJeztah
Copy link
Member

Thanks for opening this ticket; I agree that the information is a bit sparse here, and could use some details (what was the failure?).

First things first; the progress output that's shown when doing a docker push or docker pull is generated by the docker daemon; the docker CLI makes an API call to the daemon to initiate the docker push, which returns a progress stream, containing the messages that are printed by the CLI. This stream currently doesn't include error messages in the retry case, so to add more details to those messages, changes will be needed on the daemon side (which is maintained in the https://github.com/moby/moby repository). The code returning that message is here; https://github.com/moby/moby/blob/7b9275c0da707b030e62c96b679a976f31f929d3/distribution/xfer/upload.go#L153-L156

for {
    progress.Updatef(progressOutput, descriptor.ID(), "Retrying in %d second%s", delay, (map[bool]string{true: "s"})[delay != 1])
    ....
}

There was nothing in the docker daemon logs. If I recall correctly, even when it stops retrying, no specific error is printed.

Interesting; so I would have to try to reproduce the situation, but I think an error should be both returned to the CLI, as well as logged in the daemon logs if this happens https://github.com/moby/moby/blob/7b9275c0da707b030e62c96b679a976f31f929d3/distribution/xfer/upload.go#L142-L146

retries++
if _, isDNR := err.(DoNotRetry); isDNR || retries == maxUploadAttempts {
    logrus.Errorf("Upload failed: %v", err)
    u.err = err
    return
}

The logic in that part of the code is quite involved though, so I'd have to take a closer look (does it actually update the output stream in that case?)

I wanted to look into the code before replying, and did some initial digging. This comment was also interesting;

Similar situation - particularly with ECR, which returns 404s for non-existent repositories

I'm wondering if the continuous "retrying" could be caused by ECR returning a different format for the error response. Looking at the "distribution spec" (which is the API specification for OCI registries);

From the above, the push endpoints can return a 2xx status (success), a 400 status (generic error), or a 404, but the spec allows the response body to either contain a structured JSON response with error details, or a non-structured "plain text" response: https://github.com/opencontainers/distribution-spec/blob/v1.0.1/spec.md#error-codes

For the ECR case, I would expect the response to return a NAME_UNKNOWN error-code if it returns a JSON response, but question is: does it? or does it return a 404 status with an unstructured response?

The "retry loop" on the daemon side uses a retryOnError() function to determine if an error returned by the registry is a "fatal" error, or if the error is a temporary error (in which case the request should be retried)
https://github.com/moby/moby/blob/7b9275c0da707b030e62c96b679a976f31f929d3/distribution/errors.go#L152

In that function, I see there's cases for some of the error-codes that are defined by the distribution spec; https://github.com/moby/moby/blob/7b9275c0da707b030e62c96b679a976f31f929d3/distribution/errors.go#L158-L162

	case errcode.Error:
		switch v.Code {
		case errcode.ErrorCodeUnauthorized, errcode.ErrorCodeUnsupported, errcode.ErrorCodeDenied, errcode.ErrorCodeTooManyRequests, v2.ErrorCodeNameUnknown:
			return xfer.DoNotRetry{Err: err}
		}

That branch handles UNAUTHORIZED, UNSUPPORTED, DENIED, TOOMANYREQUESTS, and NAME_UNKNOWN (interesting observation is also that it combines error codes from two packages: github.com/docker/distribution/registry/api/errcode/register.go, and github.com/docker/distribution/registry/api/v2/errors.go, not sure how/why they're split)

There's also some handling for authentication issues, "unexpected" (5xx) errors, and syscall.ENOSPC (to handle out of space errors?), and some handling for cases where multiple errors are returned (to be looked into what cases would return multiple errors (may be the result of "wrapping" errors?) https://github.com/moby/moby/blob/7b9275c0da707b030e62c96b679a976f31f929d3/distribution/errors.go#L154-L157

If none of the above applies, then "other" errors are returned as-is; https://github.com/moby/moby/blob/7b9275c0da707b030e62c96b679a976f31f929d3/distribution/errors.go#L179-L183

// let's be nice and fallback if the error is a completely
// unexpected one.
// If new errors have to be handled in some way, please
// add them to the switch above.
return err

In the above case (return the error as-is), the default looks to be retry, because retry is only skipped if a xfer.DoNotRetry error is returned.

So... sorry for the lengthy reply / rambling. To summarize;

  1. the default for any error occurred during push/pull is to "retry" the request
  2. there's logic to handle some specific errors that are known to not retry
  3. but: that handling is a bit of a mish/mash of errors, and seems to (mostly) account for error-types defined by the distribution spec: no handling seems to be in place for registries that do not return a structured (JSON) error response (which MAY be the case for ECR? (to be looked into))
  4. the daemon does not return any information about errors it encountered if it does a retry, but will return errors for the "not retry" case, and MAY (to be verified) return the error once it stops retrying.

So what could be done?

  • For 1., perhaps the daemon could include the error in the stream response;
    • what to return if the registry's response is not a structured JSON? (response body could be anything: can we present that in a useful way?)
    • to be looked at: should it be combined with the "retrying" message, or "first show the error, then (after X seconds) replace the output with "retrying"")?
  • For 2. and 3. - definitely looks like that code needs some looking into; at least make sure we handle the non-JSON responses, but also consider if "retry" should be the default or if the reverse should be done. Changing the default may be dangerous though, as it would means we won't retry if a "valid" (but not handled by the code) temporary error is returned.
  • For 4.: needs to be looked into / verified if the "final" error is actually logged (and returned to the client)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants