Skip to content

Nodes fail to join cluster when using new v20210302 GPU AMIs  #3446

@Callisto13

Description

@Callisto13

TLDR:

Eksctl provides its own kubelet and docker daemon config (or in the case of GPU AMIs does some hackery elsewhere since the docker config file is not there). These 2 configs need to vaguely line up since they both need to explicitly set the cgroupdriver to be the same, otherwise the kubelet will not start. With the new GPU AMIs, they are now using the docker daemon config which was previously removed: hence our editing of the old configuration option is void. This causes the kubelet and docker daemon to come up with mis-matched cgroupdrivers, and hence the kubelet fails.

There are 2 issues here:

  1. Getting around the current problem
  2. Avoiding this in the future
    i. The GPU AMI builder is opaque, how can we get visibility on upstream changes?
    ii. Is there a "safer"/better way that eksctl can alter configuration? Should we be re-writing the configs?

Bug description

What were you trying to accomplish?

Attempting to create nodegroups with new GPU AMI (amazon-eks-gpu-node-1.xx-v20210302).

What happened?

The deployment timed out as the nodes failed to join the cluster.

How to reproduce it?

---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: test-cluster-name
  region: us-west-2

nodeGroups:
  - name: ng-1
    instanceType: p2.xlarge
    ami: ami-07cb90e1bdc02f118   # <- id of amazon-eks-gpu-node-1.18-v20210302
    desiredCapacity: 1
    ssh:
      enableSsm: true            # <- for debugging on instance
eksctl create (cluster|nodegroup) -f config.yaml

Logs

Error: timed out (after 25m0s) waiting for at least 1 nodes to join the cluster and become ready in "ng-3"

On the node we see the following...

The docker service is fine:

$ systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─nvidia-docker-dropin.conf
   Active: active (running) since Thu 2021-03-18 12:30:45 UTC; 6min ago
     Docs: https://docs.docker.com
 Main PID: 5736 (dockerd)
    Tasks: 13
   Memory: 132.2M
   CGroup: /system.slice/docker.service
           └─5736 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --add-runtime neuron=/etc/docker-runtimes.d/neuron --add-runtime nvidia=/etc/docker-runtimes.d/nvidia --default-runtime=nvidia

The kubelet service is failing:

$ systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-eksclt.al2.conf
   Active: activating (auto-restart) (Result: exit-code) since Thu 2021-03-18 12:36:45 UTC; 3s ago
     Docs: https://github.com/kubernetes/kubernetes
  Process: 19355 ExecStart=/usr/bin/kubelet --node-ip=${NODE_IP} --node-labels=${NODE_LABELS},alpha.eksctl.io/instance-id=${INSTANCE_ID} --max-pods=${MAX_PODS} --register-node=true --register-with-taints=${NODE_TAINTS} --cloud-provider=aws --container-runtime=docker --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --pod-infra-container-image=${AWS_EKS_ECR_ACCOUNT}.dkr.ecr.${AWS_DEFAULT_REGION}.${AWS_SERVICES_DOMAIN}/eks/pause:3.3-eksbuild.1 --kubeconfig=/etc/eksctl/kubeconfig.yaml --config=/etc/eksctl/kubelet.yaml (code=exited, status=255)
  Process: 19343 ExecStartPre=/sbin/iptables -P FORWARD ACCEPT -w 5 (code=exited, status=0/SUCCESS)
 Main PID: 19355 (code=exited, status=255)

$ sudo journalctl -u kubelet.service
...
Mar 18 12:31:24 ip-192-168-92-30.us-west-2.compute.internal kubelet[6485]: F0318 12:31:24.739361    6485 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
Mar 18 12:31:24 ip-192-168-92-30.us-west-2.compute.internal systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Mar 18 12:31:24 ip-192-168-92-30.us-west-2.compute.internal systemd[1]: Unit kubelet.service entered failed state.

Docker is using the daemon config at the default location:

$ cat /etc/docker/daemon.json
{
  "bridge": "none",
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "10"
  },
  "live-restore": true,
  "max-concurrent-downloads": 10,
  "default-ulimits": {
    "memlock": {
      "Hard": -1,
      "Name": "memlock",
      "Soft": -1
    }
  }
}

The kubelet is using the config provided by eksctl:

$ cat /etc/eksctl/kubelet.yaml
address: 0.0.0.0
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/eksctl/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
cgroupDriver: systemd
clusterDNS:
- 10.100.0.10
clusterDomain: cluster.local
featureGates:
  RotateKubeletServerCertificate: true
kind: KubeletConfiguration
kubeReserved:
  cpu: 80m
  ephemeral-storage: 1Gi
  memory: 893Mi
serverTLSBootstrap: true

We can see that the kubelet is explicitly using the systemd cgroupdriver, whereas docker is not. Docker will therefore default to the cgroupfs driver, causing a mis-match with the kubelet which will not start.

Anything else we need to know?

eksctl provides its own kubelet.yaml and daemon.json config files. It also allows users to change anything they want in that configuration, or anything else, via preBootstrapCommands and bootstrapCommands.

The change to use the systemd cgroupdriver came with #2962.

Further work was done in #3007 to do the same on GPU nodes. It involved editing a different config file for docker than for non-GPU nodes since at the time GPU AMIs would remove the /etc/docker/daemon.json file altogether.

The issue here is that the new AMIs (amazon-eks-gpu-node-1.xx-v20210302) now do create and use that daemon file. So we end up with kubelet and docker trying to use different drivers.

Versions

eksctl version 0.35.0 and above.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions