kubelet.yaml set cgroupDriver: systemd instead of cgroupDriver: cgroupfs in AL2-GPU instances

**What happened?**
launch unmanaged node group with p3.2xlarge gpu (ami-0f23f1b20f58cc97f)
however it failed to start -
```
systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-eksclt.al2.conf
   Active: activating (auto-restart) (Result: exit-code) since Wed 2020-12-30 14:16:36 UTC; 4s ago
     Docs: https://github.com/kubernetes/kubernetes
  Process: 22376 ExecStart=/usr/bin/kubelet --node-ip=${NODE_IP} --node-labels=${NODE_LABELS},alpha.eksctl.io/instance-id=${INSTANCE_ID} --max-pods=${MAX_PODS} --register-node=true --register-with-taints=${NODE_TAINTS} --cloud-provider=aws --container-runtime=docker --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --pod-infra-container-image=${AWS_EKS_ECR_ACCOUNT}.dkr.ecr.${AWS_DEFAULT_REGION}.${AWS_SERVICES_DOMAIN}/eks/pause:3.3-eksbuild.1 --kubeconfig=/etc/eksctl/kubeconfig.yaml --config=/etc/eksctl/kubelet.yaml (code=exited, status=255)
  Process: 22365 ExecStartPre=/sbin/iptables -P FORWARD ACCEPT -w 5 (code=exited, status=0/SUCCESS)
 Main PID: 22376 (code=exited, status=255)
```
error message:
`failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs`

`cat /etc/eksctl/kubelet.yaml` points that `cgroupDriver: systemd` however I suspect it should be `cgroupDriver: cgroupfs`

docker cgroup in Amazon Linux 2 (GPU) is set to "cgroupfs" (vs. "systemd" in non GPU versions)



**How to reproduce it?**
launch gpu group node via eksctl v0.35.0

**Anything else we need to know?**
What OS are you using, are you using a downloaded binary or did you compile eksctl, what type of AWS credentials are you using (i.e. default/named profile, MFA) - please don't include actual credentials though!

**Versions**
```
$ eksctl version
0.35.0

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

```

**Addiional info**
I also tried to set an old GPU AMI version = "ami-0969f51a73874a795" (and even unset) - the same disappointing result.
When manually changing  `/etc/systemd/system/kubelet.service.d/10-eksclt.al2.conf`
to include `--cgroup-driver=cgroupfs` and restart the service I could see the node registered successfully to my cluster.





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kubelet.yaml set cgroupDriver: systemd instead of cgroupDriver: cgroupfs in AL2-GPU instances #3005

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kubelet.yaml set cgroupDriver: systemd instead of cgroupDriver: cgroupfs in AL2-GPU instances #3005

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions