Skip to content

Conversation

@kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Jun 3, 2021

Since runc update CLI is not able to modify devices (and it is not clear
how to implement that properly), let's set SkipDevices=true,
so that a cgroup controller won't try to update devices cgroup.

This helps use cases when some other device management (NVIDIA GPUs)
applies its configuration on top of what runc does.

Make sure we do not save SkipDevices into state.json.

This is an alternative to #2988.

Proposed changelog entry

  • runc update no longer updates cgroup devices configuration.

The runc update CLI is not able to modify devices, so let's set SkipDevices
(so that a cgroup controller won't try to update devices cgroup).

This helps use cases when some other device management (NVIDIA GPUs)
applies its configuration on top of what runc does.

Make sure we do not save SkipDevices into state.json.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think dropping SkipDevices from the state is a good idea regardless of the original issue (and I prefer this to the other PR).

@cyphar
Copy link
Member

cyphar commented Jun 4, 2021

Though it should be noted this pretty much invalidates the runc update test I added in #2951 -- now runc update no longer triggers the cgroupv2 devices re-apply. So maybe it'd be better to remove the test since it's a complete no-op now (we can add a Go-based integration test later).

@kolyshkin
Copy link
Contributor Author

Yes, I am going to remove the test and maybe try to add a unit test.

I'd rather do it separately from this PR though. Let's discuss it in there: #3000

@kolyshkin kolyshkin merged commit 2f8e8e9 into opencontainers:master Jun 4, 2021
vasiliy-ul referenced this pull request in kubevirt/kubevirt Jan 3, 2024
CPU manager periodically updates cgroup settings via the container
runtime interface. Runc prior to version v1.0.0 drops all 'custom'
cgroup device rules on 'update' and that causes a race with live
migration when block volumes are hotplugged. Try to setup the test in a
way so that the VMI is migrated to a node without CPU manager.

Signed-off-by: Vasiliy Ulyanov <vulyanov@suse.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants