Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to disable nested user namespaces by setting limit to 1 #488

Merged
merged 2 commits into from
Jan 3, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Add an option to disable nested user namespaces by setting limit to 1
Some use-cases of bubblewrap want to ensure that the subprocess can't
further re-arrange the filesystem namespace, or do other more complex
namespace modification. For example, Flatpak wants to prevent sandboxed
processes from altering their /proc/$pid/root/.flatpak-info, so that
/.flatpak-info can safely be used as an indicator that a process is part
of a Flatpak app.

This approach was suggested by lukts30 on #452.
The sysctl-controlled maximum numbers of namespaces are themselves
namespaced, so we can disable nested user namespaces by setting the
limit to 1 and then entering a new, nested user namespace. The resulting
process loses its privileges in the namespace where the limit was set
to 1, so it is unable to move the limit back up.

Co-authored-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
  • Loading branch information
smcv and alexlarsson committed Dec 16, 2022
commit c305629326463a0b258f6bc398827ca7127e0cbe
54 changes: 48 additions & 6 deletions bubblewrap.c
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ static const char *opt_file_label = NULL;
static bool opt_as_pid_1;

const char *opt_chdir_path = NULL;
bool opt_disable_userns = FALSE;
bool opt_unshare_user = FALSE;
bool opt_unshare_user_try = FALSE;
bool opt_unshare_pid = FALSE;
Expand Down Expand Up @@ -311,6 +312,7 @@ usage (int ecode, FILE *out)
" --unshare-cgroup-try Create new cgroup namespace if possible else continue by skipping it\n"
" --userns FD Use this user namespace (cannot combine with --unshare-user)\n"
" --userns2 FD After setup switch to this user namespace, only useful with --userns\n"
" --disable-userns Disable further use of user namespaces inside sandbox\n"
" --pidns FD Use this pid namespace (as parent namespace if using --unshare-pid)\n"
" --uid UID Custom uid in the sandbox (requires --unshare-user or --userns)\n"
" --gid GID Custom gid in the sandbox (requires --unshare-user or --userns)\n"
Expand Down Expand Up @@ -1777,6 +1779,10 @@ parse_args_recurse (int *argcp,
argv++;
argc--;
}
else if (strcmp (arg, "--disable-userns") == 0)
{
opt_disable_userns = TRUE;
}
else if (strcmp (arg, "--remount-ro") == 0)
{
if (argc < 2)
Expand Down Expand Up @@ -2677,6 +2683,12 @@ main (int argc,
if (opt_userns_fd != -1 && opt_unshare_user_try)
die ("--userns not compatible --unshare-user-try");

if (opt_disable_userns && !opt_unshare_user)
die ("--disable-userns requires --unshare-user");

if (opt_disable_userns && opt_userns_block_fd != -1)
die ("--disable-userns is not compatible with --userns-block-fd");

/* Technically using setns() is probably safe even in the privileged
* case, because we got passed in a file descriptor to the
* namespace, and that can only be gotten if you have ptrace
Expand Down Expand Up @@ -3155,20 +3167,50 @@ main (int argc,
if (opt_userns2_fd > 0 && setns (opt_userns2_fd, CLONE_NEWUSER) != 0)
die_with_error ("Setting userns2 failed");

if (opt_unshare_user &&
(ns_uid != opt_sandbox_uid || ns_gid != opt_sandbox_gid) &&
opt_userns_block_fd == -1)
if (opt_unshare_user && opt_userns_block_fd == -1 &&
(ns_uid != opt_sandbox_uid || ns_gid != opt_sandbox_gid ||
opt_disable_userns))
Comment on lines +3176 to +3178
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that as written here, we can get into this block even if we're setuid root. Please check that this isn't a security vulnerability?

I think it's OK, because we could already have entered this block anyway (for --unshare-user --sandbox-uid=42) even with a setuid bwrap, and the only thing we're doing here is reducing the max_user_namespaces in a new namespace, never increasing it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this is ever a problem. We only ever hit this if --unshare-user is set, which means at this point we did CLONE_USERNS. So, even if we had privileged capabilities on the host we now don't have them in the user namespace:

On the other hand, that process has no capabilities in the parent (in the case of clone(2)) or previous (in the case of [unshare(2)] (https://man7.org/linux/man-pages/man2/unshare.2.html) and setns(2)) user namespace, even if the new namespace is created or joined by the root user (i.e., a process with user ID 0 in the root namespace).

I.e. even in the setuid case we have at this point the same privileges as the non-setuid case.

{
/* Now that devpts is mounted and we've no need for mount
permissions we can create a new userspace and map our uid
1:1 */
/* Here we create a second level userns inside the first one. This is
used for one or more of these reasons:

* The 1st level namespace has a different uid/gid than the
requested due to requirements of beeing root in the first
level due for mounting devpts (opt_needs_devpts).

* To disable user namespaces we set max_user_namespaces and then
create the second namespace so that the sandbox cannot undo this
change.
*/

if (opt_disable_userns)
{
cleanup_fd int sysctl_fd = -1;

sysctl_fd = openat (proc_fd, "sys/user/max_user_namespaces", O_WRONLY);

if (sysctl_fd < 0)
die_with_error ("cannot open /proc/sys/user/max_user_namespaces");

if (write_to_fd (sysctl_fd, "1", 1) < 0)
die_with_error ("sysctl user.max_user_namespaces = 1");
Comment on lines +3201 to +3202
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this has to be exactly 1. If it was 0, we wouldn't be able to unshare (CLONE_NEWUSER) immediately below, and if it was 2, then this protection would be ineffective.

}

if (unshare (CLONE_NEWUSER))
die_with_error ("unshare user ns");

/* We're in a new user namespace, we got back the bounding set, clear it again */
drop_cap_bounding_set (FALSE);

if (opt_disable_userns)
{
/* Verify that we can't make a new userns again */
res = unshare (CLONE_NEWUSER);

if (res == 0)
die ("unable to disable creation of new user namespaces");
}

write_uid_gid_map (opt_sandbox_uid, ns_uid,
opt_sandbox_gid, ns_gid,
-1, FALSE, FALSE);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex's version had this write_uid_gid_map() call conditional on ns_uid != opt_sandbox_uid || ns_gid != opt_sandbox_gid, but I'm not 100% sure why, and I found that keeping the conditional broke my sandboxed process's ability to connect to D-Bus - I think because my uid wasn't mapped into the container correctly. It's less diff like this, anyway.

Copy link
Collaborator

@alexlarsson alexlarsson Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did that because previously in that case we would not hit this codepath so I would avoid doing what we didn't do before. But we just set up a new user namespace, so we have to do this here. So, your change is correct.

Expand Down
14 changes: 14 additions & 0 deletions bwrap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,20 @@
<listitem><para>After setting up the new namespace, switch into the specified namespace. For this to work the specified namespace must be a descendant of the user namespace used for the setup, so this is only useful in combination with --userns.</para>
<para>This is useful because sometimes bubblewrap itself creates nested user namespaces (to work around some kernel issues) and --userns2 can be used to enter these.</para></listitem>
</varlistentry>
<varlistentry>
<term><option>--disable-userns</option></term>
<listitem><para>
Prevent the process in the sandbox from creating further user namespaces,
so that it cannot rearrange the filesystem namespace or do other more
complex namespace modification.
This is currently implemented by setting the
<literal>user.max_user_namespaces</literal> sysctl to 1, and then
entering a nested user namespace which is unable to raise that limit
in the outer namespace.
This option requires <option>--unshare-user</option>, and doesn't work
in the setuid version of bubblewrap.
</para></listitem>
</varlistentry>
<varlistentry>
<term><option>--pidns <arg choice="plain">FD</arg></option></term>
<listitem><para>Use an existing pid namespace instead of creating one. This is often used with --userns, because the pid namespace must be owned by the same user namespace that bwrap uses. </para>
Expand Down
1 change: 1 addition & 0 deletions completions/bash/bwrap
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ _bwrap() {
local boolean_options="
--as-pid-1
--clearenv
--disable-userns
--help
--new-session
--unshare-all
Expand Down
1 change: 1 addition & 0 deletions completions/zsh/_bwrap
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ _bwrap_args=(
'--dev-bind[Bind mount the host path SRC on DEST, allowing device access]:source:_files:destination:_files'
'--dev[Mount new dev on DEST]:mount point for /dev:_files -/'
"--die-with-parent[Kills with SIGKILL child process (COMMAND) when bwrap or bwrap's parent dies.]"
'--disable-userns[Disable further use of user namespaces inside sandbox]'
'--exec-label[Exec label for the sandbox]:SELinux label:_selinux_contexts'
'--file-label[File label for temporary sandbox content]:SELinux label:_selinux_contexts'
'--gid[Custom gid in the sandbox (requires --unshare-user or --userns)]: :_guard "[0-9]#" "numeric group ID"'
Expand Down
10 changes: 9 additions & 1 deletion tests/test-run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ srcd=$(cd $(dirname "$0") && pwd)

bn=$(basename "$0")

echo "1..57"
echo "1..58"

# Test help
${BWRAP} --help > help.txt
Expand Down Expand Up @@ -112,6 +112,7 @@ echo "ok exec failure doesn't include exit-code in json-status"
if test -n "${bwrap_is_suid:-}"; then
echo "ok - # SKIP no --cap-add support"
echo "ok - # SKIP no --cap-add support"
echo "ok - # SKIP no --disable-userns"
else
BWRAP_RECURSE="$BWRAP --unshare-user --uid 0 --gid 0 --cap-add ALL --bind / / --bind /proc /proc"

Expand All @@ -123,6 +124,13 @@ else
$BWRAP_RECURSE -- /proc/self/exe --unshare-all ${BWRAP_RO_HOST_ARGS} findmnt > recursive-newroot.txt
assert_file_has_content recursive-newroot.txt "/usr"
echo "ok - can pivot to new rootfs recursively"

$BWRAP --dev-bind / / -- true
$BWRAP --unshare-user --disable-userns --dev-bind / / -- true
! $BWRAP --unshare-user --disable-userns --dev-bind / / -- $BWRAP --dev-bind / / -- true
$BWRAP --unshare-user --disable-userns --dev-bind / / -- sh -c "echo 2 > /proc/sys/user/max_user_namespaces || true; ! $BWRAP --dev-bind / / -- true"
$BWRAP --unshare-user --disable-userns --dev-bind / / -- sh -c "echo 100 > /proc/sys/user/max_user_namespaces || true; ! $BWRAP --dev-bind / / -- true"
echo "ok - can disable nested userns"
fi

# Test error prefixing
Expand Down