-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libctr/cgroups: don't take init's cgroup into account #3784
libctr/cgroups: don't take init's cgroup into account #3784
Conversation
bfbc355
to
c7d399a
Compare
Sometimes, the init process is not in the root cgroup. This can be noted by GetInitPath, which already scrubs the path of `init.scope`. This was encountered when trying to patch the Kubelet to handle systemd being in a separate cpuset from root (to allow load balance disabling for containers). At present, there's no way to have libcontainer or runc manage cgroups in a hierarchy outside of the one init is in (unless the path contains `init.scope`, which is limiting) Signed-off-by: Peter Hunt <pehunt@redhat.com>
c7d399a
to
54e2021
Compare
I traced this feature back to commit 664fc54e65ebc14ca (see moby/moby@664fc54). @mrunalp says it was added for some cases like docker-in-docker. The biggest issue though is I can't envision what exactly will be broken (if anything) if this code gets removed. Perhaps nothing will. For one thing, I can't find anything like that in crun. |
@opencontainers/runc-maintainers what do you think about this? In short, this allows runc to be used in scenarios where init (systemd, pid 1) process is not in the top-level cgroup. The requirement for this comes from the CPU load balancing API change in recent kernels, which requires, in some scenarios, for systemd to be moved out from the top-level cgroup. From my perspective, there's absolutely no way to work around this. Now, this change will not break any common setups. It might break some old docker in docker setups on some very old kernels. The problem which the code being removed is trying to solve is now solved by having cgroup namespace in the kernel. Cgroup namespace is available since kernel 4.6, released in May 2016. @thaJeztah is there a way to test runc binary from this PR (available from Checks -> validate -> Artefacts (scroll to the very bottom of the page, i.e. https://github.com/opencontainers/runc/actions/runs/4534733741) in some kind of DinD scenario? On a related note, I remember fixing an issue with KinD (kubernetes-sigs/kind#2709) which would not exist if we'd always enable cgroupns. So maybe, together with this, we should be bold and brave and go ahead to enable cgroupns by default in cgroup v1 case, if available (currently we only do this with cgroup v2). |
That looks older than when I thought it was first added. So, I am not sure that it was related to docker in docker but perhaps non-standard v1 cgroups layouts back then across distros which shouldn't be a case now. I am 👍 to merging this. |
another question, in addition to maintainer's comfort with it, is is this PR worthy of backporting to release-1.1? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Wah, sorry for the delay. Been kinda busy. I guess one way would be to pull a |
I've tested this on CentOS 7 (kernel 3.10, no cgroupns) using docker 1.31.1 (RH fork) and the runc binary from this PR inside the docker:dind image. Works fine, the containers created inside dind are added to the proper cgroups, the limits are set (tested with I haven't tried the new runc on the host because docker 1.13.1 uses ancient runc with Changed the docker cgroup driver to systemd, works fine. To sum it up -- I was not able to find a scenario in which the kludge which is removed by this PR is needed. So, unless someone from @opencontainers/runc-maintainers says "no", let's merge this tomorrow. |
One more post-merge consideration. This is a part of systemd v1 driver, and so to test it out we would want to use dind with systemd. Which is not possible, because the official docker:dind image is running inside a bare-bone alpine linux container, which does not have systemd running. The error message is:
I am not sure how to test dind with systemd. |
Some more archaeology and post-merge reflection... Commit 2b28b3c (PR #194) removed getting init cgroup because in case of host pidns this points to a different hierarchy. This change was done for the fs driver only, not for the systemd driver (which already existed at that time). Taking into account that
Thus, the code is not needed. The other question is, do we need to use something like Fingers crossed that cgroup v1 will be obsoleted soon. |
Sometimes, the init process is not in the root cgroup. This can be noted by GetInitPath, which already scrubs the path of
init.scope
.This was encountered when trying to patch the Kubelet to handle systemd being in a separate cpuset from root (to allow load balance disabling for containers). At present, there's no way to have libcontainer or runc manage cgroups in a hierarchy outside of the one init is in (unless the path contains
init.scope
, which is limiting)