Nodes fail to join cluster when using new v20210302 GPU AMIs 

# TLDR:

Eksctl provides its own kubelet and docker daemon config (or in the case of GPU AMIs does some hackery elsewhere since the docker config file is not there). These 2 configs need to vaguely line up since they both need to explicitly set the cgroupdriver to be the same, otherwise the kubelet will not start. With the new GPU AMIs, they are now using the docker daemon config which was previously removed: hence our editing of the old configuration option is void. This causes the kubelet and docker daemon to come up with mis-matched cgroupdrivers, and hence the kubelet fails.

There are 2 issues here:
1. Getting around the current problem
2. Avoiding this in the future
   i. The GPU AMI builder is opaque, how can we get visibility on upstream changes?
   ii. Is there a "safer"/better way that eksctl can alter configuration? Should we be re-writing the configs?

# Bug description

## What were you trying to accomplish?
Attempting to create nodegroups with new GPU AMI (`amazon-eks-gpu-node-1.xx-v20210302`).

## What happened?
The deployment timed out as the nodes failed to join the cluster.

## How to reproduce it?
```yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: test-cluster-name
  region: us-west-2

nodeGroups:
  - name: ng-1
    instanceType: p2.xlarge
    ami: ami-07cb90e1bdc02f118   # <- id of amazon-eks-gpu-node-1.18-v20210302
    desiredCapacity: 1
    ssh:
      enableSsm: true            # <- for debugging on instance
```

```
eksctl create (cluster|nodegroup) -f config.yaml
```


## Logs

```
Error: timed out (after 25m0s) waiting for at least 1 nodes to join the cluster and become ready in "ng-3"
```

On the node we see the following...

The `docker` service is fine:
```sh
$ systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─nvidia-docker-dropin.conf
   Active: active (running) since Thu 2021-03-18 12:30:45 UTC; 6min ago
     Docs: https://docs.docker.com
 Main PID: 5736 (dockerd)
    Tasks: 13
   Memory: 132.2M
   CGroup: /system.slice/docker.service
           └─5736 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --add-runtime neuron=/etc/docker-runtimes.d/neuron --add-runtime nvidia=/etc/docker-runtimes.d/nvidia --default-runtime=nvidia
```

The `kubelet` service is failing:
```sh
$ systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-eksclt.al2.conf
   Active: activating (auto-restart) (Result: exit-code) since Thu 2021-03-18 12:36:45 UTC; 3s ago
     Docs: https://github.com/kubernetes/kubernetes
  Process: 19355 ExecStart=/usr/bin/kubelet --node-ip=${NODE_IP} --node-labels=${NODE_LABELS},alpha.eksctl.io/instance-id=${INSTANCE_ID} --max-pods=${MAX_PODS} --register-node=true --register-with-taints=${NODE_TAINTS} --cloud-provider=aws --container-runtime=docker --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --pod-infra-container-image=${AWS_EKS_ECR_ACCOUNT}.dkr.ecr.${AWS_DEFAULT_REGION}.${AWS_SERVICES_DOMAIN}/eks/pause:3.3-eksbuild.1 --kubeconfig=/etc/eksctl/kubeconfig.yaml --config=/etc/eksctl/kubelet.yaml (code=exited, status=255)
  Process: 19343 ExecStartPre=/sbin/iptables -P FORWARD ACCEPT -w 5 (code=exited, status=0/SUCCESS)
 Main PID: 19355 (code=exited, status=255)

$ sudo journalctl -u kubelet.service
...
Mar 18 12:31:24 ip-192-168-92-30.us-west-2.compute.internal kubelet[6485]: F0318 12:31:24.739361    6485 server.go:274] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
Mar 18 12:31:24 ip-192-168-92-30.us-west-2.compute.internal systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Mar 18 12:31:24 ip-192-168-92-30.us-west-2.compute.internal systemd[1]: Unit kubelet.service entered failed state.
```

Docker is using the daemon config at the default location:
```sh
$ cat /etc/docker/daemon.json
{
  "bridge": "none",
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "10"
  },
  "live-restore": true,
  "max-concurrent-downloads": 10,
  "default-ulimits": {
    "memlock": {
      "Hard": -1,
      "Name": "memlock",
      "Soft": -1
    }
  }
}
```

The kubelet is using the config provided by `eksctl`:
```sh
$ cat /etc/eksctl/kubelet.yaml
address: 0.0.0.0
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: /etc/eksctl/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
cgroupDriver: systemd
clusterDNS:
- 10.100.0.10
clusterDomain: cluster.local
featureGates:
  RotateKubeletServerCertificate: true
kind: KubeletConfiguration
kubeReserved:
  cpu: 80m
  ephemeral-storage: 1Gi
  memory: 893Mi
serverTLSBootstrap: true
```

We can see that the kubelet is explicitly using the `systemd` cgroupdriver, whereas docker is not. Docker will therefore default to the `cgroupfs` driver, causing a mis-match with the kubelet which will not start.

## Anything else we need to know?
`eksctl` provides its own `kubelet.yaml` and `daemon.json` config files. It also allows users to change anything they want in that configuration, or anything else, via `preBootstrapCommands` and `bootstrapCommands`.

The change to use the `systemd` cgroupdriver came with https://github.com/weaveworks/eksctl/pull/2962.

Further work was done in https://github.com/weaveworks/eksctl/pull/3007 to do the same on GPU nodes. It involved editing a different config file for docker than for non-GPU nodes since at the time GPU AMIs would remove the `/etc/docker/daemon.json` file altogether.

The issue here is that the new AMIs (`amazon-eks-gpu-node-1.xx-v20210302`) now _do_ create and use that daemon file. So we end up with kubelet and docker trying to use different drivers.

## Versions
`eksctl` version `0.35.0` and above.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Nodes fail to join cluster when using new v20210302 GPU AMIs #3446

TLDR:

Bug description

What were you trying to accomplish?

What happened?

How to reproduce it?

Logs

Anything else we need to know?

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Nodes fail to join cluster when using new v20210302 GPU AMIs #3446

Description

TLDR:

Bug description

What were you trying to accomplish?

What happened?

How to reproduce it?

Logs

Anything else we need to know?

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions