Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerd should be using systemd cgroup hierarchy #471

Closed
randomvariable opened this issue Jan 13, 2021 · 35 comments · Fixed by #540
Closed

containerd should be using systemd cgroup hierarchy #471

randomvariable opened this issue Jan 13, 2021 · 35 comments · Fixed by #540
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@randomvariable
Copy link
Member

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]

As per https://kubernetes.io/docs/setup/production-environment/container-runtimes/, there should be only be a single cgroup hiearchy running on the OS. We do not configure containerd to run using the systemd hierarchy, such that Kubernetes will have an incorrect view of resource utilisation.

What did you expect to happen:

ContainerD should be configured to use systemd hierarchy

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

ContainerD config.toml should have something along the lines of

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  ...
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true

Environment:

Project (Image Builder for Cluster API, kube-deploy/imagebuilder, konfigadm):

Additional info for Image Builder for Cluster API related issues:

  • OS (e.g. from /etc/os-release, or cmd /c ver):
  • Packer Version:
  • Packer Provider:
  • Ansible Version:
  • Cluster-api version (if using):
  • Kubernetes version: (use kubectl version):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 13, 2021
@neolit123
Copy link
Member

neolit123 commented Jan 13, 2021

just appending:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

at the end of the config should work too.
(but this assumes that there is [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] key already)

@neolit123
Copy link
Member

neolit123 commented Jan 13, 2021

the tricky part is that once this is done the following KubeletConfiguration must be passed to kubeadm:

...
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: "systemd"

@fabriziopandini @vincepri @detiber for comments on CAPI.

@timothysc
Copy link
Member

@codenrhoden - We need to fix this.

@codenrhoden
Copy link
Contributor

the tricky part is that once this is done the following KubeletConfiguration must be passed to kubeadm:

So this sounds like we have to coordinate the fix with kubeadm at the same time? Or if we have the Cgroup setup through systemd for future images, and kubeadm does not pass the needed KubeletConfiguration, will everything still work as before?

@neolit123
Copy link
Member

neolit123 commented Jan 13, 2021

So this sounds like we have to coordinate the fix with kubeadm at the same time? Or if we have the Cgroup setup through systemd for future images, and kubeadm does not pass the needed KubeletConfiguration, will everything still work as before?

if we add the systemd on the image side and the kubelet is not configured with systemd it will break. the default value in the kubelet is cgroupfs which is sane as the kubelet may not be run with systemd (as a service manager).

there are a couple of options:

  1. default the value to systemd in the KubeletConfiguration the kubeadm creates, which is a breaking change to the wider public.
  2. apply the KubletConfigurtion with systemd on a higher layer - e.g. CAPI.

after some though, i'm leaning towards 2 but would appreciate more opinions here.

@randomvariable
Copy link
Member Author

randomvariable commented Jan 13, 2021

If we configure it with Cluster API:
More likely we need to co-ordinate with Cluster API provider template authors in order to be able to passdown the cgroupDriver setting. And we don't want to default it in Cluster API Bootstrap Provider Kubeadm unless it's OS aware because of Windows?

Is there a last option in that we set the deprecated cgroup driver flag in /etc/sysconfig/kubelet such that CAPI doesn't need to do it ?

@neolit123
Copy link
Member

More likely we need to co-ordinate with Cluster API provider template authors in order to be able to passdown the cgroupDriver setting.

its quite unfortunate but all users should do this setting unless kubeadm does it by default and given it runs the kubelet with systemd. which creates this period of breaking users that were using cgroupfs on the CR side.

And we don't want to default it in Cluster API Bootstrap Provider Kubeadm unless it's OS aware because of Windows?

cgroup drivers should be completely ignored on Windows as there is no such thing there, but we found a bug where the kubelet errors on Windows if the driver reported by docker info differs:
kubernetes/kubernetes#97764

Is there a last option in that we set the deprecated cgroup driver flag in /etc/sysconfig/kubelet such that CAPI doesn't need to do it ?

the kubelet's --cgroup-driver flag is being removed soon, so this is not going to work well long term. the config file doesn't support similar overrides.

@neolit123
Copy link
Member

opening the wider discussion:
kubernetes/kubeadm#2376

@oldthreefeng
Copy link

oldthreefeng commented Jan 15, 2021

@neolit123 for containerd

$ containerd --version
containerd github.com/containerd/containerd v1.3.9 ea765aba0d05254012b0b9e595e995c09186427f

I found default systemd_cgroup section is not [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]

$ containerd config  default  
...
  [plugins."io.containerd.grpc.v1.cri"]
    systemd_cgroup = false
    disable_cgroup = false
...

@randomvariable
Copy link
Member Author

the docs may be out of date.

In any case, after some discussion, we've settled on the following:

  • Let's make the containerd change
  • Set the cgroup driver flag for kubelet in /etc/sysconfig/kubelet. Put a TODO that we need to not set that flag in whatever version Kubernetes removes that flag.
  • It is up to CAPI to sort out setting the kubelet component config for future Kubernetes versions.

@neolit123
Copy link
Member

neolit123 commented Jan 15, 2021 via email

@oldthreefeng
Copy link

@neolit123 Im sorry to confused you.
You are definitely right. I am confused by the section of .

  [plugins."io.containerd.grpc.v1.cri"]
    systemd_cgroup = false

forget it

@fabriziopandini
Copy link
Member

I'm trying to better understand what is the expected UX in CAPI after this change lands in the image builder

AFAIK currently it is not possible to change the KubeletConfiguration for the users (see kubernetes-sigs/cluster-api#1584), so I assume we should still rely on ExtraArgs, which is not ideal.

Few options:

  • The user should set the --cgroup-driver extra arg before adding a new machine or before triggering rollouts
  • CAPBK should set the --cgroup-driver extra arg if missing

Also, after the change is implemented in the image builder is there any expectation on how the change would impact the images published by the providers e.g.

  • It will impact only new images (how do we define new images)
  • All the existing images will be regnerated
  • ???

@neolit123
Copy link
Member

neolit123 commented Jan 19, 2021 via email

@randomvariable
Copy link
Member Author

randomvariable commented Jan 19, 2021

@fabriziopandini , for now, we should use /etc/sysconfig given that's functionally equivalent to extraArgs, then pivot to CAPI configuring it in v1alpha4 when we support Kubelet component config via whatever mechanism we have. Will be tracking in the node agent bits.

@fabriziopandini
Copy link
Member

This issue was discussed during CAPI office hours on the 20th of January, and the outcomes of the discussion are captured in kubernetes/kubeadm#2376.

TL;DR; in order to coordinate the change among all the involved parties and to provide a clean upgrade path for the users, we should ensure that image builder configures containerd for using the systemd cgroup driver as a default for images with Kubernetes version >= v1.21

Please let me know if there are problems with the above requirements; I will be happy to help, but I have very little knowledge of the image builder...

@codenrhoden
Copy link
Contributor

Thanks for the follow-up @fabriziopandini, and stating clearly what we need to do.

There are no problems with the requirement, it's definitely something we can do.

@codenrhoden
Copy link
Contributor

@fabriziopandini @randomvariable I could use a little help in understanding why my TOML changes don't appear to work...

I have the ansible worked out to only apply the changes for K8s >= v1.21.0. My resulting /etc/containerd/config.toml file looks like this:

version = 2

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "k8s.gcr.io/pause:3.2"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
      SystemdCgroup = true

But when I check to see if containerd has picked up the change, I don't see anything added to the "runc.options" section...

# containerd config dump | grep -A 3 -B 5 runc\.options
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_type = ""
          runtime_engine = ""
          runtime_root = ""
          privileged_without_host_devices = false
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/opt/cni/bin"
      conf_dir = "/etc/cni/net.d"

I know it's reading config.toml because I see the pause image change get picked up. But the new stuff I'm adding here doesn't show up. Any ideas? I am using a slightly older version of containerd, v1.3.4, but I assume it would read in any valid config option.

@codenrhoden codenrhoden self-assigned this Feb 9, 2021
@fabriziopandini
Copy link
Member

@neolit123 ^^

@neolit123
Copy link
Member

neolit123 commented Feb 10, 2021

@codenrhoden

as commented above, placing it at the bottom works for me:
#471 (comment)

but i got it to work by inlining [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] before too.
could be a TOML parsing issue.

EDIT: TOML is ok, but i would have preferred if they support more formats like YAML...

@codenrhoden
Copy link
Contributor

Thanks @neolit123

I don't know why, but I'm still struggling with this. Can I confirm that I should see the Cgroup entry be listed out when I do a containerd config dump? I switched to using containerd v1.4.3, and I see the runc.options in the default config dump. But no matter what I put in my TOML file, I don't see anything listed under that section. I'm going off the assumption that I should see something, but I always only ever see:

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/opt/cni/bin"
      conf_dir = "/etc/cni/net.d"

@neolit123
Copy link
Member

unless i'm mistaken, i think containerd config dump just dumps the default config.

if you have applied the change and restarted containerd using something like crictl info on the socket should give:

...
"systemdCgroup": true,
...

@codenrhoden
Copy link
Contributor

Thanks, I'll check that out.

My understanding (and observation) is that containerd config default shows the default config, and containerd config dump shows the working config after loading all files.

@neolit123
Copy link
Member

My understanding (and observation) is that containerd config default shows the default config, and containerd config dump shows the working config after loading all files.

could be, i haven't tried it.

@voor
Copy link
Member

voor commented Feb 19, 2021

Feb 19 22:46:41 ip-10-0-1-237 containerd[22994]: time="2021-02-19T22:46:41.012169979Z" level=warning msg="failed to load plugin io.containerd.grpc.v1.cri" error="invalid plugin config: `systemd_cgroup` only works for runtime io.containerd.runtime.v1.linux"

@neolit123
Copy link
Member

i've sent the PR for moving to systemd in kubeadm 1.21:
kubernetes/kubernetes#99471

@neolit123
Copy link
Member

@voor
is that a persisting error, or perhaps something is flaking?
using the "systemd" driver with containerd should be fine and we haven't seen user reports about it.

@voor
Copy link
Member

voor commented Feb 25, 2021

None of the code samples provided in this thread have set it so that containerd is using systemd cgroups, but the kubelet change definitely works, thus resulting in kubeadm not working because they're not aligned.

@neolit123
Copy link
Member

yeah, the CR driver must == the kubelet driver.

this + this works for me.

@fabriziopandini
Copy link
Member

@codenrhoden is this effort still in track for v1.21?

@codenrhoden
Copy link
Contributor

I tried this again, using containerd 1.4.4, and I still get the same behavior. no matter what I put in the config file, it doesn't register. I think I'd like to pair up with @neolit123 when he has time to show what I'm going and figure it out. I know it's parsing the config file and applying other changes from the config file. It's very strange.

@neolit123
Copy link
Member

@codenrhoden did you try crictl info?

@codenrhoden
Copy link
Contributor

@neolit123 I have, yes. no difference there.

@neolit123
Copy link
Member

containerd/containerd#4900 (comment)

looks like both config dump and crictl info don't work for verifying this.
it can be verified with systemd-cgls once there are some containers running.

i'm not that familiar with containerd myself, but this looks like something that they must fix.

all we care about is adding:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  systemdCgroup = true

at the end of the config.

@codenrhoden
Copy link
Contributor

Wow... Okay, that explains a lot. I'm just running around in circles. :) I can verify this with the methods in that comment. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants