libctr/cgroups: don't take init's cgroup into account #3784

haircommander · 2023-03-24T19:14:26Z

Sometimes, the init process is not in the root cgroup. This can be noted by GetInitPath, which already scrubs the path of init.scope.

This was encountered when trying to patch the Kubelet to handle systemd being in a separate cpuset from root (to allow load balance disabling for containers). At present, there's no way to have libcontainer or runc manage cgroups in a hierarchy outside of the one init is in (unless the path contains init.scope, which is limiting)

Sometimes, the init process is not in the root cgroup. This can be noted by GetInitPath, which already scrubs the path of `init.scope`. This was encountered when trying to patch the Kubelet to handle systemd being in a separate cpuset from root (to allow load balance disabling for containers). At present, there's no way to have libcontainer or runc manage cgroups in a hierarchy outside of the one init is in (unless the path contains `init.scope`, which is limiting) Signed-off-by: Peter Hunt <pehunt@redhat.com>

kolyshkin · 2023-03-27T18:59:17Z

I traced this feature back to commit 664fc54e65ebc14ca (see moby/moby@664fc54). @mrunalp says it was added for some cases like docker-in-docker.

The biggest issue though is I can't envision what exactly will be broken (if anything) if this code gets removed. Perhaps nothing will. For one thing, I can't find anything like that in crun.

kolyshkin · 2023-03-31T03:28:32Z

@opencontainers/runc-maintainers what do you think about this?

In short, this allows runc to be used in scenarios where init (systemd, pid 1) process is not in the top-level cgroup. The requirement for this comes from the CPU load balancing API change in recent kernels, which requires, in some scenarios, for systemd to be moved out from the top-level cgroup. From my perspective, there's absolutely no way to work around this.

Now, this change will not break any common setups. It might break some old docker in docker setups on some very old kernels. The problem which the code being removed is trying to solve is now solved by having cgroup namespace in the kernel. Cgroup namespace is available since kernel 4.6, released in May 2016.

@thaJeztah is there a way to test runc binary from this PR (available from Checks -> validate -> Artefacts (scroll to the very bottom of the page, i.e. https://github.com/opencontainers/runc/actions/runs/4534733741) in some kind of DinD scenario?

On a related note, I remember fixing an issue with KinD (kubernetes-sigs/kind#2709) which would not exist if we'd always enable cgroupns. So maybe, together with this, we should be bold and brave and go ahead to enable cgroupns by default in cgroup v1 case, if available (currently we only do this with cgroup v2).

mrunalp · 2023-03-31T03:53:37Z

I traced this feature back to commit 664fc54e65ebc14ca (see moby/moby@664fc54). @mrunalp says it was added for some cases like docker-in-docker.

That looks older than when I thought it was first added. So, I am not sure that it was related to docker in docker but perhaps non-standard v1 cgroups layouts back then across distros which shouldn't be a case now.

I am 👍 to merging this.

haircommander · 2023-03-31T14:16:49Z

another question, in addition to maintainer's comfort with it, is is this PR worthy of backporting to release-1.1?

mrunalp

LGTM

haircommander · 2023-03-31T19:46:51Z

@AkihiroSuda @cyphar @thaJeztah PTAL

kolyshkin

LGTM

haircommander · 2023-04-03T15:27:40Z

@AkihiroSuda @cyphar @thaJeztah PTAL

thaJeztah · 2023-04-03T16:21:15Z

@thaJeztah is there a way to test runc binary from this PR (available from Checks -> validate -> Artefacts (scroll to the very bottom of the page, i.e. https://github.com/opencontainers/runc/actions/runs/4534733741) in some kind of DinD scenario?

Wah, sorry for the delay. Been kinda busy.

I guess one way would be to pull a dind image from https://hub.docker.com/_/docker, and either copy the runc binary into the image, or to bind-mount it to swap the one in the image. 🤔

kolyshkin · 2023-04-03T19:59:52Z

I guess one way would be to pull a dind image from https://hub.docker.com/_/docker, and either copy the runc binary into the image, or to bind-mount it to swap the one in the image.

I've tested this on CentOS 7 (kernel 3.10, no cgroupns) using docker 1.31.1 (RH fork) and the runc binary from this PR inside the docker:dind image. Works fine, the containers created inside dind are added to the proper cgroups, the limits are set (tested with docker run --memory). So, it works inside dind on an older without cgroupns.

I haven't tried the new runc on the host because docker 1.13.1 uses ancient runc with --console option removed by #1018. So I have installed one from docker.com, and tried to use it with runc from this PR on the host AND inside the docker:dind container. Works fine.

Changed the docker cgroup driver to systemd, works fine.

To sum it up -- I was not able to find a scenario in which the kludge which is removed by this PR is needed.

So, unless someone from @opencontainers/runc-maintainers says "no", let's merge this tomorrow.

kolyshkin · 2023-04-04T16:50:26Z

One more post-merge consideration.

This is a part of systemd v1 driver, and so to test it out we would want to use dind with systemd. Which is not possible, because the official docker:dind image is running inside a bare-bone alpine linux container, which does not have systemd running. The error message is:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: systemd not running on this host, cannot use systemd cgroups manager: unknown.

I am not sure how to test dind with systemd.

kolyshkin · 2023-04-04T18:14:33Z

Some more archaeology and post-merge reflection...

Commit 2b28b3c (PR #194) removed getting init cgroup because in case of host pidns this points to a different hierarchy. This change was done for the fs driver only, not for the systemd driver (which already existed at that time).

Taking into account that

DIND was/is never used with systemd driver (I guess)
the code removed by this PR initially came as a copy-paste from the fs driver
the copy-paste happened before the above mentioned commit 2b28b3c

Thus, the code is not needed.

The other question is, do we need to use something like GetOwnCgroupPath for dind+systemd case? Maybe, but the thing is, there's no such thing as dind+systemd.

Fingers crossed that cgroup v1 will be obsoleted soon.

haircommander force-pushed the root-cgroup-no-init branch 2 times, most recently from bfbc355 to c7d399a Compare March 27, 2023 17:15

haircommander force-pushed the root-cgroup-no-init branch from c7d399a to 54e2021 Compare March 27, 2023 17:16

mrunalp approved these changes Mar 31, 2023

View reviewed changes

haircommander mentioned this pull request Mar 31, 2023

[release-4.13] OCPNODE-1584: libctr/cgroups: don't take init's cgroup into account openshift/opencontainers-runc#11

Merged

kolyshkin approved these changes Mar 31, 2023

View reviewed changes

kolyshkin closed this Mar 31, 2023

kolyshkin reopened this Mar 31, 2023

kolyshkin merged commit cc60a39 into opencontainers:main Apr 4, 2023

This was referenced Apr 4, 2023

libct/cg: rm GetInitCgroup[Path] #3810

Merged

[1.1] libctr/cgroups: don't take init's cgroup into account #3811

Merged

kolyshkin added the backport/1.1-done A PR in main branch which has been backported to release-1.1 label Apr 6, 2023

BenTheElder mentioned this pull request Apr 19, 2023

WSL2 ERROR: kind create cluster fails on Starting control-plane kubernetes-sigs/kind#3174

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libctr/cgroups: don't take init's cgroup into account #3784

libctr/cgroups: don't take init's cgroup into account #3784

haircommander commented Mar 24, 2023

kolyshkin commented Mar 27, 2023

kolyshkin commented Mar 31, 2023

mrunalp commented Mar 31, 2023

haircommander commented Mar 31, 2023

mrunalp left a comment

haircommander commented Mar 31, 2023

kolyshkin left a comment

haircommander commented Apr 3, 2023

thaJeztah commented Apr 3, 2023

kolyshkin commented Apr 3, 2023

kolyshkin commented Apr 4, 2023

kolyshkin commented Apr 4, 2023

libctr/cgroups: don't take init's cgroup into account #3784

libctr/cgroups: don't take init's cgroup into account #3784

Conversation

haircommander commented Mar 24, 2023

kolyshkin commented Mar 27, 2023

kolyshkin commented Mar 31, 2023

mrunalp commented Mar 31, 2023

haircommander commented Mar 31, 2023

mrunalp left a comment

Choose a reason for hiding this comment

haircommander commented Mar 31, 2023

kolyshkin left a comment

Choose a reason for hiding this comment

haircommander commented Apr 3, 2023

thaJeztah commented Apr 3, 2023

kolyshkin commented Apr 3, 2023

kolyshkin commented Apr 4, 2023

kolyshkin commented Apr 4, 2023