-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ContentLength mismatch in blob upload #1004
Comments
Hmm, I notice the failure to PUT the index occurred after pushing lots of other sub-manifests. It seems to have carried over the auth from one to the next for the others, but not for this one, hence the In the retry, it first does an index at |
The reauth theory is pretty good actually. If we hit an auth challenge, we perform the token exchange and retry the request in here. I'm guessing if there's a request body, the body gets consumed or closed during the token handshake, and the subsequent retry fails. We should probably stash the body somewhere before retrying -- not sure how to do this most idiomatically. |
I can't remember if I even tried to fix this. Do you still hit this ever? |
This issue is stale because it has been open for 90 days with no |
Coming back to this, as I keep coming across it. I often will retry, and then it goes fine, or sometimes not. Can this be a retry issue? I looked at the Moby source code, which just does 5 retries on errors, see here. I am not quite sure how our retries work? |
I believe the problem is that we often retry failed requests without considering whether the body of the request has been consumed. For blob uploads, we actually wrap the whole blob upload in some retry logic, so we probably mask that issue when we just restart the upload from scratch. For manifest uploads, this was added somewhat recently: #1041 What version are you using for this? It may also be the case that whatever error we're hitting because of the consumed request body doesn't pass our test of "shouldRetry"... My theory is:
Looking at http godoc, there are some interesting things to tug on: https://pkg.go.dev/net/http#Transport
We could use similar logic to the stdlib and start setting the (This would have the bonus effect of getting the http stdlib transport to help with retrying requests) We can do something similar to |
Actually, the majority of the errors I see (anecdotally) are around manifests/indexes, not blobs. Everything is wrapped up in
My
I like that theory. Those messages when it fails (not all of the time, but most of it), with:
fits pretty well with the theory. |
I spent more time than I probably should have digging into https://github.com/golang/go/blob/6178d25fc0b28724b1b5aec2b1b74fc06d9294c7/src/net/http/transfer.go#L388-L391 I cannot actually reproduce this failure to confirm my theory in any meaningful way with a unit test, but I did learn that golang will automagically set GetBody for certain request body types (which we fall into) so that the body can be rewound: https://github.com/golang/go/blob/6178d25fc0b28724b1b5aec2b1b74fc06d9294c7/src/net/http/request.go#L890-L912 Unfortunately, it won't replay requests that aren't "safe" unless you set a magic header: https://github.com/golang/go/blob/4d8db00641cc9ff4f44de7df9b8c4f4a4f9416ee/src/net/http/request.go#L1415-L1429 I'm going to do this for now for our PUT requests, which might help a bit? |
Hey, @jonjohnsonjr did you end up implementing what you suggested above? (Adding the header to the PUT request?) |
I did not! I think wrapping the the PUT in a retry solved this ~mostly: go-containerregistry/pkg/v1/remote/write.go Line 689 in 4a0e0af
It is probably worth adding the header, if you're still seeing those failures. Do you have any way to repro or have logs? |
Hey,
|
Thanks -- what version of go-containerregistry is that with? |
Sorry I dropped the ball on this. We were fairly up to date so I think it was 0.13.0 |
I am having an intermittent issue with ContentLength mismatch. It may be registry-side related, I cannot reproduce it reliably, but if I call a lot of pushes, it does show up with some regularity.
I ran with debug logging enabled, here is what I got, when pushing an image from linuxkit, filtered some of the irrelevant data out:
It tries to put a manifest, gets a
401
, goes to authorize, goes back with it, and gets an error. It doesn't look like it is an http error, but I cannot tell from the logs.Interestingly, that error line looks like it comes from here, but based on that code, it should not error out. Yet in this case it definitely does with a non-zero exit code (which I wish I had thought to capture).
When I reran, it went just fine:
The text was updated successfully, but these errors were encountered: