Skip to content

Improve libcontainer's systemd cgroup driver #2007

@filbranden

Description

@filbranden

I'm opening this issue to track improvements on systemd cgroup driver I and some others are working on.

The main motivations to improve the systemd cgroup driver are:

  • It's a path towards cgroupv2 support, since what the systemd API exports matches the feature set from cgroupv2. systemd also supports three hierarchies ("legacy", "hybrid" and "unified") to support Linux distributions and administrators in making a gradual migration to v2.

  • To support the "rootless" efforts, of running a container manager without root privileges. Projects such as usernetes are making strides in that direction. They rely on using systemd to manage cgroups, since writing to the cgroup tree directly would require root privileges.

  • To minimize future breakage with changes to kernel implementation and systemd versions. The systemd API encoding cgroupv2 attributes is expected to be stable and to be maintained even if future changes get introduced. New restrictions may start being imposed, to ensure the hierarchy works as expected (non-authoritative agents shouldn't be able to "hijack" resources from the machine), doing delegation correctly is a key step in preventing DoS attacks through the cgroup tree.

The steps I have in mind for this series of refactorings is:

  1. Set systemd unit properties rather than writing to the cgroup tree. PR [RFC] Implement systemd-specific per-cgroup support (+ proof-of-concept "devices" and "memory") #1991 is my WIP towards that goal. The idea is to extend the API, so it allows encoding systemd property settings that reflect the configuration, and then migrate all controllers to set systemd properties rather than write directly to the cgroup subtree.

  2. Support working under an unified hierarchy. I have a branch which does most of that (assuming step 1 has been addressed, I'm taking some shortcuts there.) Setting properties goes through systemd, so paths don't matter there. Reading stats goes through the tree, but it's easy to detect which hierarchy is in use and to adjust paths to match the correct one.

  3. Create sub-cgroups of a delegated systemd scope, to pass it to containers. This follows the recommendations from systemd, which are also based on the cgroupv2 implementation (particularly the workings of cgroup.subtree_control), so having libcontainer support this delegation protocol would definitely be advantageous.

One large concern here is testing. I was talking to @mrunalp about this and he offered to help. Since running this setup on a container can be challenging, he suggested using an external test runner, possibly monitoring GitHub PRs and reporting changes back to GitHub. We'll be working on this testing and will report back here when we have more details. (We might open separate issues to track testing on systemd as well.)

I'm planning to join the OCI call on Wednesday March 13th (also to discuss related opencontainers/runtime-spec#1005 and opencontainers/runtime-spec#1002), so I'm happy to talk more about this libcontainer systemd cgroup driver improvement proposal.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions