Cannot restart Kubelet when there is no network connection #2567

vpineda1996 · 2022-11-09T20:44:36Z

Image I'm using:
bottlerocket-aws-k8s-1.21-x86_64-v1.8.0-a6233c22

What I expected to happen:
When I reboot my machine when there is no network connection, I am expecting kubelet to come back online or to at least see the process failing to start. If there is no network connection, I should see the kubelet running on the cached container.

What actually happened:
The container in which the kubelet is executed is not able to start because BR instructs containerd to call ECR to fetch the image rather than use the cached version.

Nov 08 20:18:19 ip-10-0-3-98.us-west-2.compute.internal host-ctr[1102632]: time="2022-11-08T20:18:19Z" level=info msg="pulling with Amazon ECR Resolver" ref="ecr.aws/arn:aws:ecr:us-west-2:193646904820:repository/eks/eks-distro/kubernetes/pause:v1.21.14-eks-1-21-19"
Nov 08 20:19:49 ip-10-0-3-98.us-west-2.compute.internal systemd[1]: kubelet.service: start-pre operation timed out. Terminating.
Nov 08 20:19:49 ip-10-0-3-98.us-west-2.compute.internal systemd[1]: kubelet.service: Control process exited, code=killed, status=15/TERM
Nov 08 20:19:49 ip-10-0-3-98.us-west-2.compute.internal systemd[1]: kubelet.service: Failed with result 'timeout'.
Nov 08 20:19:49 ip-10-0-3-98.us-west-2.compute.internal systemd[1]: Failed to start Kubelet.

How to reproduce the problem:

Create a CPI or worker node with BR in AWS.
Start SSM Session
Remove network access to machine by removing all egress security group traffic. SSM session should continue to work at this point.
Restart kubelet
kubelet is never initalized.

The text was updated successfully, but these errors were encountered:

jpculp · 2022-11-11T00:06:20Z

Hi @vpineda1996, can you expand a bit on your use case? Network access is required to pull ECR credentials, but also pulling a fresh container on reboot resets you back to an unmodified state (excluding the files under the persistent storage locations).

bcressey · 2022-11-11T05:50:15Z

@jpculp the attempt to pull the pause container via host-ctr doesn't complete before systemd gives up, which means that kubelet never gets started:

bottlerocket/packages/kubernetes-1.24/prestart-pull-pause-ctr.conf

Line 2 in 64be2f2

    
           # Pull the pause container image before starting `kubelet` so `containerd/cri` wouldn't have to

There should be a better way to deal with this in the detached network case, where there's already a cached copy of the image on disk. Especially for the pause container, reusing the local copy if it exists should be good enough.

vpineda1996 · 2022-11-14T17:36:32Z

Hey @jpculp, I think @bcressey has an idea of what I want to achieve. I think I might have phrased my requirements incorrectly, making you believe that I wanted to use the same instantiation of the pause container after kubelet is restarted. This in fact NOT what I meant to say.

I want to reuse the cached pause image that was pulled and cached inside the host. That means that after the kubelet gets restarted, a new fresh container must be created but instead of trying to pull the image every single time, BR should be smart enough to use the "local" image if its present.

vpineda1996 · 2022-11-14T17:37:04Z

I submitted a similar fix for the EKS AMI. awslabs/amazon-eks-ami#1090

kdaula added this to Bottlerocket Engineering Roadmap Nov 16, 2022

etungsten self-assigned this Nov 16, 2022

etungsten added area/kubernetes K8s including EKS, EKS-A, and including VMW priority/p1 labels Nov 16, 2022

etungsten mentioned this issue Nov 17, 2022

During kubelet prestart, skip pause image pull if image exist #2587

Merged

stmcginnis added status/needs-triage Pending triage or re-evaluation and removed priority/p1 labels Dec 1, 2022

etungsten closed this as completed in #2587 Dec 2, 2022

etungsten moved this to Done in Bottlerocket Engineering Roadmap Dec 2, 2022

hitsub2 mentioned this issue Jun 13, 2023

awsnodetemplate multi-arch BlockDeviceMappings aws/karpenter-provider-aws#4029

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot restart Kubelet when there is no network connection #2567

Cannot restart Kubelet when there is no network connection #2567

vpineda1996 commented Nov 9, 2022

jpculp commented Nov 11, 2022

bcressey commented Nov 11, 2022

vpineda1996 commented Nov 14, 2022

vpineda1996 commented Nov 14, 2022

Cannot restart Kubelet when there is no network connection #2567

Cannot restart Kubelet when there is no network connection #2567

Comments

vpineda1996 commented Nov 9, 2022

jpculp commented Nov 11, 2022

bcressey commented Nov 11, 2022

vpineda1996 commented Nov 14, 2022

vpineda1996 commented Nov 14, 2022