Skip to content

Finding the minimal set of privileges for a docker container to spawn rootless containers #1456

Open
@ggoodman

Description

@ggoodman

I've been flailing away at the idea to run a pool of rootless containers as children of a docker container. My intent is to have the docker container run a web server that will spin up a pool of child, rootless containers to which requests can be proxied. These children would be designed to be isolated from each other and the host system from the side-effects of running untrusted code.

I need to pass additional file descriptors to these children which precludes running children as siblings using the host docker daemon. So here I am and I hope I'm not overstepping my bounds by asking for guidance via an issue.

Set up

Create a root filesystem tgz:

$ docker export $(docker create alpine) > rootfs.tgz

Dockerfile with runc, libseccomp2 and the rootfs:

FROM buildpack-deps

RUN apt-get update && apt-get install -y --no-install-recommends \
		libseccomp2 \
	&& rm -rf /var/lib/apt/lists/*

ADD rootfs.tgz /child/rootfs
ADD runc /usr/local/sbin/runc

WORKDIR /child/rootfs

RUN runc spec --rootless

CMD ["runc", "run", "child"]

False starts:

Build and run the container, adding CAP_SYS_ADMIN:

$ docker run --rm -it --cap-add SYS_ADMIN $(docker build -q .)
container_linux.go:265: starting container process caused "process_linux.go:261: applying cgroup configuration
for process caused \"mkdir /sys/fs/cgroup/cpuset/child: read-only file system\""

Same, but mount /sys/fs/cgroup as rw:

$ docker run --rm -it --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:rw $(do
cker build -q .)
container_linux.go:265: starting container process caused "process_linux.go:339: container init caused \"could
not create session key: operation not permitted\""

Same, but invoke runc with --no-new-keyring:

$ docker run --rm -it --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:rw $(do
cker build -q .) runc run --no-new-keyring child
container_linux.go:265: starting container process caused "process_linux.go:339: container init caused \"rootfs
_linux.go:104: jailing process inside rootfs caused \\\"pivot_root operation not permitted\\\"\""

Finally 'working':

Same, but also add --no-pivot:

$ docker run --rm -it --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:rw $(do
cker build -q .) runc run --no-new-keyring --no-pivot child
/ #

Disclaimer: I'm still wrapping my head around all of the complexity and nuances of all the technologies we call 'containers' so please correct me if I'm wrong.

Removing pivot_root seems like a bad idea given my objectives so I created a copy of the default seccomp profile and added the pivot_root syscall to the big list of SCMP_ACT_ALLOW calls. This let me drop --no-pivot.

What kind of exposure am I creating by opening up by whitelisting the pivot_root syscall?

Also, I'm past my abilities in trying to figure out how I might avoid --no-new-keyring

What kind of exposure am I creating by using the --no-new-keyring flag?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions