-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't pull modestly sized image from Quay.io #14789
Comments
As far as I know, this timeout (60 seconds) is from the container runtime. Not in minikube, and not even Kubernetes anymore... Like https://github.com/Mirantis/cri-dockerd/blob/v0.2.5/config/options.go#L48 In the future, there might be a config file where these can be set as well. Loading it from a tarball in the cache (or a local registry), would be the workaround... See |
Hmm, wonder if this is some kind of regression. It is suppose to trigger on missing progress updates. // If no pulling progress is made before imagePullProgressDeadline, the image pulling will be cancelled.
// Docker reports image progress for every 512kB block, so normally there shouldn't be too long interval
// between progress updates.
imagePullProgressDeadline time.Duration |
Thank you @afbjorklund. If it is from the container runtime why does I've installed minikube in a very standard way |
Also I can see the whole 600MB image being pulled in in my network monitoring. It's more than a trickle it's a firehose :) |
I mean it is in the CRI, which is the murky gray zone (DMZ) between them Also, the code comments say "10 seconds" timeout rather than "1 minute" ? Wonder if there is another layer involved, like a timeout from the gRPC in CRI etc. EDIT: The 10 seconds was how often it updates the log, then a timeout after 60 |
Thanks for your help on this, especially on a Sunday. I guess I will look at methods of getting images into minikube from local registry, and or look at swapping to the podman runtime. I've got a colleague at another organization who has confirmed the behaviour independently. It feels almost absurd that the docker runtime would have this as a default behaviour since it just makes it silently unusable.
It's swapping to the context exceeded deadline response after one minute. By this point the network traffic from the image pull has ended. |
It would be interesting to know if -1.23 also has this problem, or if it is a regression due to the dockershim removal in 1.24- ? |
Ahh I didn't realize something major like that had happened recently. |
We do see this behaviour in other places, when loading a large image such as KIC base into a new cluster. Then it gives progress for the download, but then there is a big old freeze at 99% when it is loading... To put it in old world terms, you get progress output from the "wget" but nothing from the "tar" It is especially bad if you already have it loaded, since then the download progress is empty as well |
I'm not sure transferring a component to another power is a major event ? https://www.mirantis.com/blog/the-future-of-dockershim-is-cri-dockerd/ But sure, both CRI and CNI have been snoozed upon long enough (in Docker) |
Is there a way to see pull progress during the pull for a config apply, or only when doing minikube image pull? Currently I'm just sitting in the blind until it swaps to the error after a minute. |
Hmm, as far as I know it uses the docker client which calls the REST API - so you are at the mercy of those. But it is supposed to output everything in the log, at the "info" level. So it's hidden somewhere in journald
Nice: EDIT: I take my earlier comments back, it does seem to be outputting progress for both download and extract:
|
From earlier pulling my images as part of the deployment...
After running
|
My image has a "Stopping pulling image" line part way through grabbing some of the layers. |
It might be a regression. (from dockershim to cri-dockerd) }
func (d *kubeDockerClient) Info() (*dockertypes.Info, error) {
- ctx, cancel := d.getTimeoutContext()
+ ctx, cancel := context.WithTimeout(context.Background(), d.timeout)
defer cancel()
resp, err := d.client.Info(ctx)
if ctxErr := contextError(ctx); ctxErr != nil { }
opts.RegistryAuth = base64Auth
- ctx, cancel := d.getCancelableContext()
+ ctx, cancel := context.WithTimeout(context.Background(), d.timeout)
defer cancel()
resp, err := d.client.ImagePull(ctx, image, opts)
if err != nil { So looks like copy/paste. (needs to be reported to Mirantis) // getCancelableContext returns a new cancelable context. For long running requests without timeout, we use cancelable
// context to avoid potential resource leak, although the current implementation shouldn't leak resource.
func (d *kubeDockerClient) getCancelableContext() (context.Context, context.CancelFunc) {
return context.WithCancel(context.Background())
}
// getTimeoutContext returns a new context with default request timeout
func (d *kubeDockerClient) getTimeoutContext() (context.Context, context.CancelFunc) {
return context.WithTimeout(context.Background(), d.timeout)
} |
Note that it also looks totally unrelated to the Then again, the code says that short operations are two minutes (not the one observed) |
No worries. I wouldn't have had a clue where to look without your help. |
Short term, moving to a different runtime seems to be the solution. |
It's definitely closer to 1 minute than 2. I saw the 2 minute default the other day but we ruled it out at the time because we were definitely seeing it fail sooner. |
That was supposed to be the long term solution... :-) But, as the saying goes, now you have two problems ? Once we can sort out the legacy docker socket issues, we too might move over to containerd as the default runtime. |
Thanks again for all your help. Enjoy your sunday :) |
It would be nice to confirm two things:
If so, then we know it is the linked issue. |
I turned off/on my network half way, and it seemed to be ~ 2 minutes (with a granularity of 10 seconds) After that you get the update in the log ("Stop pulling image", but with progress still ongoing) and:
You still get the abort (after 1 minute) with 1.23.9 too, but only when the network is completely off. |
Fixed: Mirantis/cri-dockerd@460da8e Should be in cri-dockerd 0.2.6 |
Workaround, meanwhile:
|
Hey all, First off, this thread is a lifesaver, my team has been trying to figure out this same issue for quite a bit now. We are pulling a very large image as well (~15 GB) and are reliant for-better-or-worse on minikube for local development right now. I saw that your pull request to bump up cri-dockerd to 0.2.5 was still open/pending, so I tried installing cri-dockerd v0.2.5 from source inside the minikube VM according to the readme instructions. Unfortunately, while the installation ostensibly went fine, the timeout is still happening when I try to pull the image. I have two questions:
Thanks for the help! |
@henryrior For minikube, updating the cri-dockerd commit and rebuilding should do it. It will still be broken, unless explicitly enabling CNI plugin. |
I actually was building from latest master, so the changes should in theory be in. I will try out that flag after rebuilding again and see how that goes. Thanks again |
Turns out using a different container run time, as Joss talked about earlier, fixed our problem. ex: |
It was working with |
@afbjorklund I used Minikube version on Mac M1
Start with |
It is waiting on this PR: |
Any updates on this? |
Not really, fixed in cri-dockerd 0.2.6 A workaround would be to upgrade it |
What Happened?
Pulling a 600MB image and larger from Quay.io fails with
context deadline exceeded
.The image is there, I can pull it with docker / podman / singularity.
I tagged the ubuntu:20.04 base image and pushed it to our Quay.io repo. This image is 27MB and minikube is able to successfully pull and run the image. But it fails to pull the 600MB image from the same repo.
This is not an authentication issue with the private repo because the 27MB image works.
On my network monitoring I can see the traffic from the 600MB image being pulled in, and it is pulling the full image in. On Quay.io I can see the pull logs for the image and it is being pulled, but it always fails due
context deadline exceeded
.This is not a network issue. I have a stable and extremely fast connection.
I am at my wits end here. This is only a 600MB image. Larger images also fail. What is happening?
Attach the log file
I cannot give you anything more than these relevant lines from
minikube logs
redacted for hopefully obvious reasons.Operating System
Ubuntu
Driver
Docker
The text was updated successfully, but these errors were encountered: