-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
TLDR:
Eksctl provides its own kubelet and docker daemon config (or in the case of GPU AMIs does some hackery elsewhere since the docker config file is not there). These 2 configs need to vaguely line up since they both need to explicitly set the cgroupdriver to be the same, otherwise the kubelet will not start. With the new GPU AMIs, they are now using the docker daemon config which was previously removed: hence our editing of the old configuration option is void. This causes the kubelet and docker daemon to come up with mis-matched cgroupdrivers, and hence the kubelet fails.
There are 2 issues here:
- Getting around the current problem
- Avoiding this in the future
i. The GPU AMI builder is opaque, how can we get visibility on upstream changes?
ii. Is there a "safer"/better way that eksctl can alter configuration? Should we be re-writing the configs?
Bug description
What were you trying to accomplish?
Attempting to create nodegroups with new GPU AMI (amazon-eks-gpu-node-1.xx-v20210302).
What happened?
The deployment timed out as the nodes failed to join the cluster.
How to reproduce it?
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: test-cluster-name
region: us-west-2
nodeGroups:
- name: ng-1
instanceType: p2.xlarge
ami: ami-07cb90e1bdc02f118 # <- id of amazon-eks-gpu-node-1.18-v20210302
desiredCapacity: 1
ssh:
enableSsm: true # <- for debugging on instanceeksctl create (cluster|nodegroup) -f config.yaml
Logs
Error: timed out (after 25m0s) waiting for at least 1 nodes to join the cluster and become ready in "ng-3"
On the node we see the following...
The docker service is fine:
$ systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/docker.service.d
└─nvidia-docker-dropin.conf
Active: active (running) since Thu 2021-03-18 12:30:45 UTC; 6min ago
Docs: https://docs.docker.com
Main PID: 5736 (dockerd)
Tasks: 13
Memory: 132.2M
CGroup: /system.slice/docker.service
└─5736 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --add-runtime neuron=/etc/docker-runtimes.d/neuron --add-runtime nvidia=/etc/docker-runtimes.d/nvidia --default-runtime=nvidiaThe kubelet service is failing:
$ systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-eksclt.al2.conf
Active: activating (auto-restart) (Result: exit-code) since Thu 2021-03-18 12:36:45 UTC; 3s ago
Docs: https://github.com/kubernetes/kubernetes
Process: 19355 ExecStart=/usr/bin/kubelet --node-ip=${NODE_IP} --node-labels=${NODE_LABELS},alpha.eksctl.io/instance-id=${INSTANCE_ID} --max-pods=${MAX_PODS} --register-node=true --register-with-taints=${NODE_TAINTS} --cloud-provider=aws --container-runtime=docker --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --pod-infra-container-image=${AWS_EKS_ECR_ACCOUNT}.dkr.ecr.${AWS_DEFAULT_REGION}.${AWS_SERVICES_DOMAIN}/eks/pause:3.3-eksbuild.1 --kubeconfig=/etc/eksctl/kubeconfig.yaml --config=/etc/eksctl/kubelet.yaml (code=exited, status=255)
Process: 19343 ExecStartPre=/sbin/iptables -P FORWARD ACCEPT -w 5 (code=exited, status=0/SUCCESS)
Main PID: 19355 (code=exited, status=255)
$ sudo journalctl -u kubelet.service
...
Mar 18 12:31:24 ip-192-168-92-30.us-west-2.compute.internal kubelet[6485]: F0318 12:31:24.739361 6485 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
Mar 18 12:31:24 ip-192-168-92-30.us-west-2.compute.internal systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Mar 18 12:31:24 ip-192-168-92-30.us-west-2.compute.internal systemd[1]: Unit kubelet.service entered failed state.Docker is using the daemon config at the default location:
$ cat /etc/docker/daemon.json
{
"bridge": "none",
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "10"
},
"live-restore": true,
"max-concurrent-downloads": 10,
"default-ulimits": {
"memlock": {
"Hard": -1,
"Name": "memlock",
"Soft": -1
}
}
}The kubelet is using the config provided by eksctl:
$ cat /etc/eksctl/kubelet.yaml
address: 0.0.0.0
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 2m0s
enabled: true
x509:
clientCAFile: /etc/eksctl/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 5m0s
cacheUnauthorizedTTL: 30s
cgroupDriver: systemd
clusterDNS:
- 10.100.0.10
clusterDomain: cluster.local
featureGates:
RotateKubeletServerCertificate: true
kind: KubeletConfiguration
kubeReserved:
cpu: 80m
ephemeral-storage: 1Gi
memory: 893Mi
serverTLSBootstrap: trueWe can see that the kubelet is explicitly using the systemd cgroupdriver, whereas docker is not. Docker will therefore default to the cgroupfs driver, causing a mis-match with the kubelet which will not start.
Anything else we need to know?
eksctl provides its own kubelet.yaml and daemon.json config files. It also allows users to change anything they want in that configuration, or anything else, via preBootstrapCommands and bootstrapCommands.
The change to use the systemd cgroupdriver came with #2962.
Further work was done in #3007 to do the same on GPU nodes. It involved editing a different config file for docker than for non-GPU nodes since at the time GPU AMIs would remove the /etc/docker/daemon.json file altogether.
The issue here is that the new AMIs (amazon-eks-gpu-node-1.xx-v20210302) now do create and use that daemon file. So we end up with kubelet and docker trying to use different drivers.
Versions
eksctl version 0.35.0 and above.