-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet cannot pull images when using ECR containerProxy asset repository #16762
Comments
Another similar issue - #13377. And a slack thread on #kops-users. |
@elliotdobson When SSH'ed into a problematic node can you run this command? Substituting the image for one that you expect to work. I'm curious if the response contains valid credentials or not.
|
@rifelpet it works fine. I can pass those credentials into
and the node has all the required images...
The node has also successfully joined the cluster and kOps is validating OK... 🤔 There is still the same pull image error message in the kubelet log (for the first 10 minutes of the nodes life), but it seems to eventually get over that, pulls the images and starts the containers successfully. So I guess that is a red-herring. I'll roll the rest of the cluster and report back if it's working. |
I rolled the second control-plane node in the cluster and it failed to join the cluster within the default kOps validation timeout (15 mins). When I SSH into the second control-plane node it has no container images present, kubelet logs are filled with pull image error messages (same as original post). However if I pull the
So definitely some issue around the AWS credential provider, |
Out of curiosity I tried rolling a worker node and it successfully joined the cluster but it cannot pull the |
What is the error message when pulling the pause image? |
The same as reported in the original post:
And similar access denied in the containerd logs too (presumably kubelet just supersets containerd logs) |
What about |
Works fine, I get valid credentials back that I can then use to pull images from private ECR.
Not that I know of. Do you have any that you suggest? In the slack thread I found @olemarkus had a few comments that seem to point to the issue: Specifically when containerd tries to pull the |
The issue is indeed the same. |
The easiest workaround is to call the credential helper directly and pass the credentials to crictl and pull the image, as suggested here: containerd/containerd#6637 (comment) Bottlerocket does the same: bottlerocket-os/bottlerocket#382 You might be able to do this with additionalUserData but need to ensure it runs after containerd is running, so it may take adding a systemd service that depends on containerd. A long term solution would be for kops' nodeup to pull the |
Ok so the root issue is that containerd pulls the sandbox image anonymously, and so the ultimate fix for this would need to come in containerd to enable that. (I have commented on the containerd issue that you linked) In the meantime though... I like your idea about using a long-term workaround (like you say) would be for kOps nodeup to pull the sandbox image ( Perhaps another alternative to kOps Container Image Asset Repository would be containerd Registry Mirror as per #16593. What do you think? |
/kind bug
1. What
kops
version are you running? The commandkops version
, will displaythis information.
Client version: 1.29.2 (git-v1.29.2)
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.Server Version: v1.29.7
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
We are configuring local image asset repository however we are running into an issue when trying to update the cluster. We have configured all the ECR private repositories as required.
assets.containerProxy
in the Cluster speckops get assets --copy
kops update cluster
kops rolling-update
5. What happened after the commands executed?
New node fails to join the cluster and cluster validation fails.
Upon SSH'ing into the new node and checking the logs via
journalctl -u kubelet.service
we see that kubelet is unable to pull images from ECR:6. What did you expect to happen?
kubelet is able to pull images successfully from ECR.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?
The kubelet log shows that the image credential provider flags are being passed:
The ecr-credential-provider binary exists at the location passed to kubelet:
The credential provider config exists at the location passed to kubelet (and looks valid):
Seems like a similar issue as #13494 however there was no clear resolution in that issue (and we are not using the AWS China partition).
The text was updated successfully, but these errors were encountered: