-
Notifications
You must be signed in to change notification settings - Fork 2.2k
cgroups: Set: fix freeze, avoid unnecessary freeze from systemd v1 #3082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
As suggested in #3065 (comment), this can be further improved to avoid the freeze entirely if we're sure systemd won't set the deny-all device rule. Looking at Problem is, because of
|
|
Together with #3081 (review), I think this PR is pretty neat and good.
If we want to do that for container as well, yeah. Although I don't know how often containers are allowed access to all devices? For the k8s usage for managing a control group, a simple flag is ok though, but that is up to you and the other runc maintainers.
Yup, it becomes quite a mess... 🙃 The best fix is probably to just start using cgroup v2 instead. |
0eae769 to
5f5d5c4
Compare
Container run via
Problem is, from the libcontainer/cgroup we do not know if it's a container or a pod cgroup. Even if it's a pod, it can still have some device access rules (kubernetes doesn't currently set any, but this may change), so having something like The only thing why I don't like the current solution (as implemented by |
fdc81c3 to
b1c14c7
Compare
In that case the device rules would be allow-all, wouldn't they? if (c->device_allow || policy != CGROUP_DEVICE_POLICY_AUTO)
r = cg_set_attribute("devices", path, "devices.deny", "a");
else
r = cg_set_attribute("devices", path, "devices.allow", "a");If you have no device_allow list, and the device policy is auto then EDIT: Ah, your patch only does the skip if we have |
@cyphar Yes, at least for kubernetes that uses libcontainter/cgroup to configure pod cgroups. As pod cgroup is a parent for a few containers cgroups, we do unnecessarily freeze all those containers on update, which is definitely not the way to do. Those pod tests that we're adding recently are trying to emulate those kubernetes usage scenarios. |
m.Freeze method changes m.cgroups.Resources.Freezer field, which should not be done while we're temporarily freezing the cgroup in Set. If this field is changed, and r == m.cgroups.Resources (as it often happens), this results in inability to freeze the container using Set(). To fix, add and use a method which does not change r.Freezer field. A test case for the bug will be added separately. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The t.Name() usage in libcontainer/integration prevented subtests to be used, since in such case it returns a string containing "/", and thus it can't be used to name a container. Fix this by replacing slashes with underscores where appropriate. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In addition to freezing and thawing a container via Pause/Resume, there is a way to also do so via Set. This way was broken though and is being fixed by a few preceding commits. The test is added to make sure this is fixed and won't regress. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Introduce freezeBeforeSet, which contains the logic of figuring out whether we need to freeze/thaw around setting systemd unit properties. In particular, if SkipDevices is set, and the current unit properties allow all devices, there is no need to freeze and thaw, as systemd won't write any device rules in this case. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This was initially added by commit 3e5c199 because Set (with r.Freezer = Frozen) was not able to freeze a container. Now (see a few previous commits) Set can do the freeze, so the explicit Freeze is no longer needed. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
TestPodSkipDevicesUpdate checks that updating a pod having SkipDevices: true does not result in spurious "permission denied" errors in a container running under the pod. The test is somewhat similar in nature to the @test "update devices [minimal transition rules]" in tests/integration, but uses a pod. This tests the validity of freezeBeforeSet in v1. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
|
Addressed @cyphar review comments, here are the changes I've made: diff --git a/libcontainer/cgroups/systemd/v1.go b/libcontainer/cgroups/systemd/v1.go
index 2a567bb8..1a8e1e3c 100644
--- a/libcontainer/cgroups/systemd/v1.go
+++ b/libcontainer/cgroups/systemd/v1.go
@@ -341,7 +341,7 @@ func (m *legacyManager) GetStats() (*cgroups.Stats, error) {
// (unlike our fs driver, they will happily write deny-all rules to running
// containers). So we have to freeze the container to avoid the container get
// an occasional "permission denied" error.
-func (m *legacyManager) freezeBeforeSet(unitName string, r *configs.Resources) (needsFreeze, needsThaw bool, Err error) {
+func (m *legacyManager) freezeBeforeSet(unitName string, r *configs.Resources) (needsFreeze, needsThaw bool, err error) {
// Special case for SkipDevices, as used by Kubernetes to create pod
// cgroups with allow-all device policy).
if r.SkipDevices {
@@ -352,10 +352,13 @@ func (m *legacyManager) freezeBeforeSet(unitName string, r *configs.Resources) (
// Interestingly, (1) and (2) are the same here because
// a non-existent unit returns default properties,
// and settings in (2) are the defaults.
- devPolicy, err := getUnitProperty(m.dbus, unitName, "DevicePolicy")
- if err == nil && devPolicy.Value == dbus.MakeVariant("auto") {
- devAllow, err := getUnitProperty(m.dbus, unitName, "DeviceAllow")
- if err == nil && devAllow.Value == dbus.MakeVariant([]deviceAllowEntry{}) {
+ //
+ // Do not return errors from getUnitProperty, as they alone
+ // should not prevent Set from working.
+ devPolicy, e := getUnitProperty(m.dbus, unitName, "DevicePolicy")
+ if e == nil && devPolicy.Value == dbus.MakeVariant("auto") {
+ devAllow, e := getUnitProperty(m.dbus, unitName, "DeviceAllow")
+ if e == nil && devAllow.Value == dbus.MakeVariant([]deviceAllowEntry{}) {
needsFreeze = false
needsThaw = false
return
@@ -367,8 +370,8 @@ func (m *legacyManager) freezeBeforeSet(unitName string, r *configs.Resources) (
needsThaw = true
// Check the current freezer state.
- freezerState, Err := m.GetFreezerState()
- if Err != nil {
+ freezerState, err := m.GetFreezerState()
+ if err != nil {
return
}
if freezerState == configs.Frozen { |
|
CI went south :(
|
cyphar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
based on (and currently includes) #3081 and #3067. Keeping this a draft before those two are merged.Set()method (withr.Freezerset toFrozen). Add a test.Review commit-by-commit. Please see individual commits for details.
1.0 backport: #3093