-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgroup v2: support rootless systemd #2281
Conversation
@giuseppe @filbranden @mrunalp PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make detectuid sligthly more forgiving to the value of XDG_RUNTIME_DIR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @AkihiroSuda !
I have some comments, I hope you find them useful. Overall, looks really good!
Cheers,
Filipe
libcontainer/cgroups/systemd/user.go
Outdated
|
||
// detectUID detects UID from $XDG_RUNTIME_DIR if running in userNS. | ||
// Otherwise returns os.Getuid() . | ||
func detectUID() (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hoping you wouldn't need this... But looks like you still do for the external auth of the user D-Bus connection... Let me think about this one see if there's a better solution than this.
Having said that, see my comment below on requiring that the environment variables are properly set. So you could say that this is a hard requirement here. Looks like your code will fail if it doesn't match /run/user/%d
which I think is a good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libsystemd (which is used by crun) seems extracting UID from the "0::/user.slice/user-%d.slice/..."
string in /proc/self/cgroup
:
https://github.com/systemd/systemd/blob/6829d8ce6927c151e4391c6a5e69b76906fbd2be/src/libsystemd/sd-bus/sd-bus.c#L104
https://github.com/systemd/systemd/blob/ce4c73eb5d3c9a23628c450db0e25a529620adbe/src/basic/cgroup-util.c#L1361-L1395
I feel this seems less robust than $XDG_RUNTIME_DIR
, but maybe we should do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 👍 for path.Base(xdr)
proposed by @kolyshkin #2281 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet another way is to exec busctl --user status
and grep ^OwnerUID=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
busctl --user status
ends up resolving it from getsockopt(fd, SOL_SOCKET, SO_PEERCRED, ...)
, see here.
I believe you can do the same in Golang with unix.GetsockoptUcred(int(fd), unix.SOL_SOCKET, unix.SO_PEERCRED)
from the "golang.org/x/sys/unix"
package. Does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libsystemd (which is used by crun) seems extracting UID from the
"0::/user.slice/user-%d.slice/..."
string in/proc/self/cgroup
:
Yes, I think that's possibly reasonable. It's still reimplementing C code in Golang but I think that's preferrable than some of the alternatives.
But see my message above about getting the real UID from a SO_PEERCRED
call...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
busctl --user status ends up resolving it from getsockopt(fd, SOL_SOCKET, SO_PEERCRED, ...), see here.
Seems not true for OwnerUID
. OwnerUID
seems extracted from cgroup.
https://github.com/systemd/systemd/blob/v245/src/libsystemd/sd-bus/bus-dump.c#L375-L377
https://github.com/systemd/systemd/blob/v245/src/libsystemd/sd-bus/bus-creds.c#L509-L526
Anyway, SO_PEERCRED
isn't what we want here, because the kernel automatically translates the UID in the ucred struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated PR to use busctl --user status
. I think this is the most robust way.
I guess eventually systemd may switch away from parsing cgroup and adopt more robust logic , but we won't be affected as long as we use busctl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes good point sorry my search was too shallow for it...
I think I just worry a bit about busctl
not being available, it was first introduced as a debugging tool so I can see how distributions might not want to ship it or not ship it by default and it might go missing in the future...
Anyways, it's a fallback and we can iterate on it. Looks good.
d5e4e38
to
159eecf
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@AkihiroSuda I have a WIP progress patchset to remove Paths map from the systemd UnifiedManager, and simplify its initialization (meaning, for example, that you won't have to add Again, it's not ready yet but you can get the idea of what I'm trying to do: |
@kolyshkin SGTM, do you want your commits to be merged before this PR or after? |
@AkihiroSuda I think yours will be easier to port on top of mine than vise versa |
If #2299 takes longer, can we merge this PR first? |
Sorry for the delay, it is ready now, and it will definitely be easier to work with the updated/refactored codebase. |
Need a rebase |
Needs #2340 |
159eecf
to
cfbcc3c
Compare
CI green. Ready to review/merge. @opencontainers/runc-maintainers |
41faeb4
to
7e37c19
Compare
fbc0cc7
to
a251537
Compare
@kolyshkin @mrunalp PTAL |
rebased |
a251537
to
07bdb3e
Compare
@kolyshkin LGTY? |
5552602
to
e8389d3
Compare
addressed comments |
still LGTM |
Tested with both Podman (master) and Moby (master), on Ubuntu 19.10 . $ podman --cgroup-manager=systemd run -it --rm --runtime=runc \ --cgroupns=host --memory 42m --cpus 0.42 --pids-limit 42 alpine / # cat /proc/self/cgroup 0::/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope / # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/memory.max 44040192 / # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/cpu.max 42000 100000 / # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/pids.max 42 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
e8389d3
to
bf15cc9
Compare
rebased |
Tested with both Podman (master) and Moby (master), on Ubuntu 19.10 .
The behavior corresponds to crun v0.13.
To test with Moby, launch
dockerd-rootless.sh
with--exec-opt native.cgroupdriver=systemd
.Fix #2163