Add an option to disable nested user namespaces by setting limit to 1 #488

smcv · 2022-03-22T17:50:01Z

Add an option to disable nested user namespaces by setting limit to 1

Some use-cases of bubblewrap want to ensure that the subprocess can't
further re-arrange the filesystem namespace, or do other more complex
namespace modification. For example, Flatpak wants to prevent sandboxed
processes from altering their /proc/$pid/root/.flatpak-info, so that
/.flatpak-info can safely be used as an indicator that a process is part
of a Flatpak app.

This approach was suggested by lukts30 on Add --disable-userns switch #452.
The sysctl-controlled maximum numbers of namespaces are themselves
namespaced, so we can disable nested user namespaces by setting the
limit to 1 and then entering a new, nested user namespace. The resulting
process loses its privileges in the namespace where the limit was set
to 1, so it is unable to move the limit back up.

Co-authored-by: Alexander Larsson
Add --assert-userns-disabled option

We can't combine --disable-userns with entering an existing user
namespace via --userns if the existing user namespace was created with
--disable-userns, because its ability to create nested user namespaces
has already been disabled. However, the next best thing is to verify
that we are already in the desired state.

smcv · 2022-03-22T17:50:52Z

This still needs an automated test, and needs testing on a system where setuid bwrap is required (e.g. Debian 10).

lukts30 · 2022-07-21T14:25:17Z

Note that this would also block exploiting vulnerabilities like CVE-2022-34918 without relying on some form of syscall filtering.
This prevents processes inside the "sandbox" of utilizing exploitable kernel code that is only guarded by ns_capable checks without disabling unprivileged namespace globally.

rusty-snake · 2022-07-21T15:45:59Z

@lukts30 there is at least one such vulnerability almost every month, I have an nonextensive list at netblue30/firejail#4939 (comment).

smcv · 2022-07-21T19:55:50Z

Note that this would also block exploiting vulnerabilities like GHSA-9v26-h3ph-p8v7 without relying on some form of syscall filtering

That's almost exactly my motivation for this: at the moment, Flatpak has seccomp-based syscall filtering as an essential part of its security model (in order to stop container payloads from entering new user namespaces, which it has to prevent because that would subvert its idea of identity), and I want to be able to stop doing that, at least for some apps.

I primarily want this because we can get stronger guarantees in user-space if entering a new userns is prevented, and you primarily want this in order to reduce kernel attack surface, but the same mechanism helps both.

smcv · 2022-08-09T10:51:21Z

This still needs an automated test

I added one.

needs testing on a system where setuid bwrap is required (e.g. Debian 10)

On such systems, this feature doesn't work, because the unprivileged user doesn't have permission to set the maximum number of namespaces. If Flatpak (or another bubblewrap user) uses this feature, it will have to do it conditionally, similar to how it already handles various other things that don't work while setuid. This class of systems is increasingly obsolete, so I think this is reasonable.

However, on such systems, the unprivileged user can't create new user namespaces anyway - they can't do it themselves, because the kernel configuration won't allow it, and they also can't do it via the setuid bwrap, because PR_SET_NO_NEW_PRIVS won't allow the setuid bit to take effect.

swick · 2022-08-09T20:48:27Z

bubblewrap.c

+      if (unshare (CLONE_NEWUSER))
+        die_with_error ("unshare user ns");
+
+      /* Check that we can't make a new userns again */


might as well sysctl user.max_user_namespaces = 2 and then try to create the user ns

might as well sysctl user.max_user_namespaces = 2 and then try to create the user ns

This seems like it would be misleading, because setting the sysctl to a higher value succeeds and the unshare still fails, so I think that would give a reader the wrong idea about why this works. To verify this, I patched bubblewrap.c like so:

diff --git a/bubblewrap.c b/bubblewrap.c index cd57e86..8e21fc2 100644 --- a/bubblewrap.c +++ b/bubblewrap.c @@ -3200,6 +3200,8 @@ main (int argc, if (write_to_fd (sysctl_fd, "1", 1) < 0) die_with_error ("sysctl user.max_user_namespaces = 1"); + + fprintf (stderr, "max userns set to 1\n"); } if (unshare (CLONE_NEWUSER)) @@ -3216,10 +3218,31 @@ main (int argc, if (opt_disable_userns || opt_assert_userns_disabled) { + cleanup_fd int sysctl_fd = -1; + + sysctl_fd = openat (proc_fd, "sys/user/max_user_namespaces", O_WRONLY); + + if (sysctl_fd < 0) + die_with_error ("cannot open /proc/sys/user/max_user_namespaces"); + + if (write_to_fd (sysctl_fd, "42", 2) < 0) + die_with_error ("sysctl user.max_user_namespaces = 42"); + + fprintf (stderr, "max userns set to 42\n"); + /* Verify that we can't make a new userns again */ res = unshare (CLONE_NEWUSER); if (res == 0) die ("unable to disable creation of new user namespaces"); + fprintf (stderr, "unshare(CLONE_NEWUSER) failed as expected, with: %s\n", strerror (errno)); + +#if 0 + (void) lseek (sysctl_fd, 0, SEEK_SET); + if (write_to_fd (sysctl_fd, "1", 1) < 0) + die_with_error ("sysctl user.max_user_namespaces = 1"); + + fprintf (stderr, "max userns back to 1\n"); +#endif } /* All privileged ops are done now, so drop caps we don't need */

With or without the block that's in #if 0 here, everything here succeeds, except for the unshare() which fails with ENOSPC as we would hope. What is happening here (as I understand it) is that the bubblewrap process has capabilities in the userns that it is creating or joining (because we haven't called drop_privs() yet), like so:

initns, owned by uid 0, limit set to 10000 or something

intermediate ns, owned by us, limit set to 1

final ns, owned by us, limit set to 42

and so we are allowed to set the limit to anything we want to, even if it's greater than the parent namespace's limit. However (as documented in namespaces(7)), when we try to create a new userns, the resulting number of user namespaces is checked against "the corresponding namespace limit for the creator UID in the ancestor namespace", and if that limit would be exceeded then creation fails.

So the user.max_user_namespaces in the final namespace is actually largely irrelevant, and it's the user.max_user_namespaces in the intermediate namespace (in which members of the final namespace are incapable) which matters.

I don't really want to set the limit to a higher value for this check and then leave it there, because it seems unexpected that a process could read out user.max_user_namespaces, think "oh that's fine, I'm probably allowed to create up to 41 more namespaces", and then find that it gets ENOSPC.

I also don't really want to set the limit higher and then reset it back down to 1 (as would happen if the #if 0 block is uncommented), because if I do that, it looks like a security-significant action even though actually it's not. Also, it'll look misleading in strace or similar.

And, I think it's possible to enter the opt_assert_userns_disabled block even if we're setuid root, which means we need to be careful not to do anything that could be a net user privilege increase (like increasing the user.max_user_namespaces of the init namespace).

alexlarsson · 2022-08-16T09:25:40Z

I really like this approach, although I feel that there are possible issues with the way flatpak relies on the 2 nested user namespaces, with the --userns2 argument in the parent_expose_pids || parent_share_pids support. In this case we would have now three user namespaces, and we'd enter the 3rd one. Would we need a --userns3?

I'm not sure if that is required, but also I don't really see a reason for having 3 user namespaces in this case. Can we not reuse the already existing unshare (CLONE_NEWUSER), possibly entering that codepath when we did not before by adding a || opt_disable_userns to the if?

smcv · 2022-08-16T11:25:34Z

there are possible issues with the way flatpak relies on the 2 nested user namespaces, with the --userns2 argument in the parent_expose_pids || parent_share_pids support. In this case we would have now three user namespaces, and we'd enter the 3rd one

That's a good point, I'm not sure how this change would interact with that.

alexlarsson · 2022-09-06T07:35:16Z

I updated this to do the limits inside the code that already created the second userns. I believe that will be enough to avoid the issues with flatpak i mentioned. I'll do some testing with that.

I also added an "incompat error" for disable-userns and --userns-block-fd, because that doesn't seem to work together.

alexlarsson · 2022-09-06T09:01:54Z

So, playing with this in flatpak, and there is a general issue with the "expose-pid" and "share-pid" sandbox options in that they pass --userns {parent-pid} (as well as ---userns2) which is incompatible with this form of --disable-userns. However, I guess that isn't really a problem as we will be putting the sandboxed thing in a userns that has the right limits anyway. I'll verify this works by temporarily disabling the seccomp-based namespace limits.

alexlarsson · 2022-09-06T09:39:31Z

So, testing this, and it seems to work. However, when testing writes to /proc/sys/user/max_user_namespaces I always get permission errors unless I also add --cap-add all. So I guess in practice, in many cases we don't need the recursive namespaces. It doesn't hurt to keep it though.

This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. This doesn't work with a setuid bubblewrap, so if you're using that you now have to build flatpak with --with-priv-mode=setuid, even when using the system bubblewrap.

alexlarsson · 2022-09-06T09:46:23Z

I made a WIP flatpak PR to use this.

@smcv Can you review this to make sure it looks ok to you. Maybe test with e.g. the steam sandbox stuff.

This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>

Some use-cases of bubblewrap want to ensure that the subprocess can't further re-arrange the filesystem namespace, or do other more complex namespace modification. For example, Flatpak wants to prevent sandboxed processes from altering their /proc/$pid/root/.flatpak-info, so that /.flatpak-info can safely be used as an indicator that a process is part of a Flatpak app. This approach was suggested by lukts30 on containers#452. The sysctl-controlled maximum numbers of namespaces are themselves namespaced, so we can disable nested user namespaces by setting the limit to 1 and then entering a new, nested user namespace. The resulting process loses its privileges in the namespace where the limit was set to 1, so it is unable to move the limit back up. Co-authored-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Simon McVittie <smcv@collabora.com>

We can't combine --disable-userns with entering an existing user namespace via --userns if the existing user namespace was created with --disable-userns, because its ability to create nested user namespaces has already been disabled. However, the next best thing is to verify that we are already in the desired state. Signed-off-by: Simon McVittie <smcv@collabora.com>

This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>

smcv · 2022-12-16T18:22:46Z

bubblewrap.c

+  if (opt_unshare_user && opt_userns_block_fd == -1 &&
+      (ns_uid != opt_sandbox_uid || ns_gid != opt_sandbox_gid ||
+       opt_disable_userns))


Note that as written here, we can get into this block even if we're setuid root. Please check that this isn't a security vulnerability?

I think it's OK, because we could already have entered this block anyway (for --unshare-user --sandbox-uid=42) even with a setuid bwrap, and the only thing we're doing here is reducing the max_user_namespaces in a new namespace, never increasing it.

I don't see how this is ever a problem. We only ever hit this if --unshare-user is set, which means at this point we did CLONE_USERNS. So, even if we had privileged capabilities on the host we now don't have them in the user namespace:

On the other hand, that process has no capabilities in the parent (in the case of clone(2)) or previous (in the case of [unshare(2)] (https://man7.org/linux/man-pages/man2/unshare.2.html) and setns(2)) user namespace, even if the new namespace is created or joined by the root user (i.e., a process with user ID 0 in the root namespace).

I.e. even in the setuid case we have at this point the same privileges as the non-setuid case.

smcv · 2022-12-16T18:25:03Z

bubblewrap.c

+          if (write_to_fd (sysctl_fd, "1", 1) < 0)
+            die_with_error ("sysctl user.max_user_namespaces = 1");


Note that this has to be exactly 1. If it was 0, we wouldn't be able to unshare (CLONE_NEWUSER) immediately below, and if it was 2, then this protection would be ineffective.

smcv · 2022-12-16T18:30:28Z

bubblewrap.c

@@ -3174,6 +3213,14 @@ main (int    argc,
                         -1, FALSE, FALSE);


Alex's version had this write_uid_gid_map() call conditional on ns_uid != opt_sandbox_uid || ns_gid != opt_sandbox_gid, but I'm not 100% sure why, and I found that keeping the conditional broke my sandboxed process's ability to connect to D-Bus - I think because my uid wasn't mapped into the container correctly. It's less diff like this, anyway.

I think I did that because previously in that case we would not hit this codepath so I would avoid doing what we didn't do before. But we just set up a new user namespace, so we have to do this here. So, your change is correct.

smcv · 2022-12-16T18:31:50Z

bubblewrap.c

+  if (opt_disable_userns || opt_assert_userns_disabled)
+    {
+      /* Verify that we can't make a new userns again */
+      res = unshare (CLONE_NEWUSER);
+      if (res == 0)
+        die ("unable to disable creation of new user namespaces");
+    }


As with the other edits, I think we can get here even in a setuid-root bwrap, so we need to be sure that this is not a security vulnerability. But I don't see how it could be: we're trying to create a new userns containing only this process, and we die if that operation succeeded.

This case is slightly different, because in the setuid case we can hit it without having unshared users, so we still have permissions in the host user namespace. However, I agree that this code should be safe to run.

smcv · 2022-12-16T18:54:46Z

Maybe test with e.g. the steam sandbox stuff.

Yes, it works. With flatpak/flatpak#5084 at commit e5c7e25, which includes this branch at 11d5339, I can run Steam Flatpak, and also run a game in the sub-sandbox.

alexlarsson · 2023-01-03T10:04:03Z

This lgtm.

This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>

Fix test failure since #488 when running as uid 0

smcv marked this pull request as draft March 22, 2022 17:50

smcv mentioned this pull request Mar 22, 2022

Add --disable-userns switch #452

Closed

smcv force-pushed the disable-userns-via-limit branch from 492f636 to a26815e Compare March 22, 2022 18:23

smcv mentioned this pull request Mar 22, 2022

Performance impact of seccomp filter in games flatpak/flatpak#4187

Open

smcv mentioned this pull request Aug 9, 2022

Handle syscalls via allowlist instead of denylist flatpak/flatpak#4462

Draft

smcv requested a review from alexlarsson August 9, 2022 10:22

smcv marked this pull request as ready for review August 9, 2022 10:51

smcv force-pushed the disable-userns-via-limit branch from a26815e to 92ebb9a Compare August 9, 2022 10:52

swick reviewed Aug 9, 2022

View reviewed changes

smcv marked this pull request as draft August 16, 2022 11:25

alexlarsson force-pushed the disable-userns-via-limit branch from 92ebb9a to 1205054 Compare September 6, 2022 07:33

alexlarsson mentioned this pull request Sep 6, 2022

Use new --disable-userns bubblewrap feature when possible flatpak/flatpak#5084

Merged

smcv force-pushed the disable-userns-via-limit branch 2 times, most recently from 12576c2 to f25cb4b Compare December 16, 2022 18:27

smcv force-pushed the disable-userns-via-limit branch from f25cb4b to 962d245 Compare December 16, 2022 18:33

smcv and others added 2 commits December 16, 2022 18:45

smcv force-pushed the disable-userns-via-limit branch from 962d245 to 11d5339 Compare December 16, 2022 18:46

smcv commented Dec 16, 2022

View reviewed changes

smcv marked this pull request as ready for review December 16, 2022 18:56

smcv added the enhancement label Dec 16, 2022

alexlarsson merged commit b5f6723 into containers:main Jan 3, 2023

smcv mentioned this pull request May 4, 2023

Block access to user namespaces #368

Closed

smcv added a commit that referenced this pull request May 4, 2023

Merge pull request #554 from smcv/disable-userns-tests-root

ad76c2d

Fix test failure since #488 when running as uid 0

smcv mentioned this pull request Feb 17, 2024

Add CLI option to pass common signals to sandboxed process #586

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to disable nested user namespaces by setting limit to 1 #488

Add an option to disable nested user namespaces by setting limit to 1 #488

smcv commented Mar 22, 2022 •

edited

Loading

smcv commented Mar 22, 2022

lukts30 commented Jul 21, 2022

rusty-snake commented Jul 21, 2022

smcv commented Jul 21, 2022

smcv commented Aug 9, 2022

swick Aug 9, 2022

smcv Dec 16, 2022

alexlarsson commented Aug 16, 2022

smcv commented Aug 16, 2022

alexlarsson commented Sep 6, 2022

alexlarsson commented Sep 6, 2022

alexlarsson commented Sep 6, 2022

alexlarsson commented Sep 6, 2022

smcv Dec 16, 2022

alexlarsson Jan 3, 2023

smcv Dec 16, 2022

smcv Dec 16, 2022

alexlarsson Jan 3, 2023 •

edited

Loading

smcv Dec 16, 2022

alexlarsson Jan 3, 2023

smcv commented Dec 16, 2022

alexlarsson commented Jan 3, 2023

		if (write_to_fd (sysctl_fd, "1", 1) < 0)
		die_with_error ("sysctl user.max_user_namespaces = 1");

Add an option to disable nested user namespaces by setting limit to 1 #488

Add an option to disable nested user namespaces by setting limit to 1 #488

Conversation

smcv commented Mar 22, 2022 • edited Loading

smcv commented Mar 22, 2022

lukts30 commented Jul 21, 2022

rusty-snake commented Jul 21, 2022

smcv commented Jul 21, 2022

smcv commented Aug 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexlarsson commented Aug 16, 2022

smcv commented Aug 16, 2022

alexlarsson commented Sep 6, 2022

alexlarsson commented Sep 6, 2022

alexlarsson commented Sep 6, 2022

alexlarsson commented Sep 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexlarsson Jan 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smcv commented Dec 16, 2022

alexlarsson commented Jan 3, 2023

smcv commented Mar 22, 2022 •

edited

Loading

alexlarsson Jan 3, 2023 •

edited

Loading