Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to disable nested user namespaces by setting limit to 1 #488

Merged
merged 2 commits into from
Jan 3, 2023

Conversation

smcv
Copy link
Collaborator

@smcv smcv commented Mar 22, 2022

  • Add an option to disable nested user namespaces by setting limit to 1

    Some use-cases of bubblewrap want to ensure that the subprocess can't
    further re-arrange the filesystem namespace, or do other more complex
    namespace modification. For example, Flatpak wants to prevent sandboxed
    processes from altering their /proc/$pid/root/.flatpak-info, so that
    /.flatpak-info can safely be used as an indicator that a process is part
    of a Flatpak app.

    This approach was suggested by lukts30 on Add --disable-userns switch #452.
    The sysctl-controlled maximum numbers of namespaces are themselves
    namespaced, so we can disable nested user namespaces by setting the
    limit to 1 and then entering a new, nested user namespace. The resulting
    process loses its privileges in the namespace where the limit was set
    to 1, so it is unable to move the limit back up.

    Co-authored-by: Alexander Larsson

  • Add --assert-userns-disabled option

    We can't combine --disable-userns with entering an existing user
    namespace via --userns if the existing user namespace was created with
    --disable-userns, because its ability to create nested user namespaces
    has already been disabled. However, the next best thing is to verify
    that we are already in the desired state.

@smcv smcv marked this pull request as draft March 22, 2022 17:50
@smcv
Copy link
Collaborator Author

smcv commented Mar 22, 2022

This still needs an automated test, and needs testing on a system where setuid bwrap is required (e.g. Debian 10).

@lukts30
Copy link

lukts30 commented Jul 21, 2022

Note that this would also block exploiting vulnerabilities like CVE-2022-34918 without relying on some form of syscall filtering.
This prevents processes inside the "sandbox" of utilizing exploitable kernel code that is only guarded by ns_capable checks without disabling unprivileged namespace globally.

@rusty-snake
Copy link
Contributor

@lukts30 there is at least one such vulnerability almost every month, I have an nonextensive list at netblue30/firejail#4939 (comment).

@smcv
Copy link
Collaborator Author

smcv commented Jul 21, 2022

Note that this would also block exploiting vulnerabilities like GHSA-9v26-h3ph-p8v7 without relying on some form of syscall filtering

That's almost exactly my motivation for this: at the moment, Flatpak has seccomp-based syscall filtering as an essential part of its security model (in order to stop container payloads from entering new user namespaces, which it has to prevent because that would subvert its idea of identity), and I want to be able to stop doing that, at least for some apps.

I primarily want this because we can get stronger guarantees in user-space if entering a new userns is prevented, and you primarily want this in order to reduce kernel attack surface, but the same mechanism helps both.

@smcv
Copy link
Collaborator Author

smcv commented Aug 9, 2022

This still needs an automated test

I added one.

needs testing on a system where setuid bwrap is required (e.g. Debian 10)

On such systems, this feature doesn't work, because the unprivileged user doesn't have permission to set the maximum number of namespaces. If Flatpak (or another bubblewrap user) uses this feature, it will have to do it conditionally, similar to how it already handles various other things that don't work while setuid. This class of systems is increasingly obsolete, so I think this is reasonable.

However, on such systems, the unprivileged user can't create new user namespaces anyway - they can't do it themselves, because the kernel configuration won't allow it, and they also can't do it via the setuid bwrap, because PR_SET_NO_NEW_PRIVS won't allow the setuid bit to take effect.

@smcv smcv marked this pull request as ready for review August 9, 2022 10:51
@smcv smcv force-pushed the disable-userns-via-limit branch from a26815e to 92ebb9a Compare August 9, 2022 10:52
bubblewrap.c Outdated
if (unshare (CLONE_NEWUSER))
die_with_error ("unshare user ns");

/* Check that we can't make a new userns again */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well sysctl user.max_user_namespaces = 2 and then try to create the user ns

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well sysctl user.max_user_namespaces = 2 and then try to create the user ns

This seems like it would be misleading, because setting the sysctl to a higher value succeeds and the unshare still fails, so I think that would give a reader the wrong idea about why this works. To verify this, I patched bubblewrap.c like so:

diff --git a/bubblewrap.c b/bubblewrap.c
index cd57e86..8e21fc2 100644
--- a/bubblewrap.c
+++ b/bubblewrap.c
@@ -3200,6 +3200,8 @@ main (int    argc,
 
           if (write_to_fd (sysctl_fd, "1", 1) < 0)
             die_with_error ("sysctl user.max_user_namespaces = 1");
+
+          fprintf (stderr, "max userns set to 1\n");
         }
 
       if (unshare (CLONE_NEWUSER))
@@ -3216,10 +3218,31 @@ main (int    argc,
 
   if (opt_disable_userns || opt_assert_userns_disabled)
     {
+      cleanup_fd int sysctl_fd = -1;
+
+      sysctl_fd = openat (proc_fd, "sys/user/max_user_namespaces", O_WRONLY);
+
+      if (sysctl_fd < 0)
+        die_with_error ("cannot open /proc/sys/user/max_user_namespaces");
+
+      if (write_to_fd (sysctl_fd, "42", 2) < 0)
+        die_with_error ("sysctl user.max_user_namespaces = 42");
+
+      fprintf (stderr, "max userns set to 42\n");
+
       /* Verify that we can't make a new userns again */
       res = unshare (CLONE_NEWUSER);
       if (res == 0)
         die ("unable to disable creation of new user namespaces");
+      fprintf (stderr, "unshare(CLONE_NEWUSER) failed as expected, with: %s\n", strerror (errno));
+
+#if 0
+      (void) lseek (sysctl_fd, 0, SEEK_SET);
+      if (write_to_fd (sysctl_fd, "1", 1) < 0)
+        die_with_error ("sysctl user.max_user_namespaces = 1");
+
+      fprintf (stderr, "max userns back to 1\n");
+#endif
     }
 
   /* All privileged ops are done now, so drop caps we don't need */

With or without the block that's in #if 0 here, everything here succeeds, except for the unshare() which fails with ENOSPC as we would hope. What is happening here (as I understand it) is that the bubblewrap process has capabilities in the userns that it is creating or joining (because we haven't called drop_privs() yet), like so:

  • initns, owned by uid 0, limit set to 10000 or something
    • intermediate ns, owned by us, limit set to 1
      • final ns, owned by us, limit set to 42

and so we are allowed to set the limit to anything we want to, even if it's greater than the parent namespace's limit. However (as documented in namespaces(7)), when we try to create a new userns, the resulting number of user namespaces is checked against "the corresponding namespace limit for the creator UID in the ancestor namespace", and if that limit would be exceeded then creation fails.

So the user.max_user_namespaces in the final namespace is actually largely irrelevant, and it's the user.max_user_namespaces in the intermediate namespace (in which members of the final namespace are incapable) which matters.

I don't really want to set the limit to a higher value for this check and then leave it there, because it seems unexpected that a process could read out user.max_user_namespaces, think "oh that's fine, I'm probably allowed to create up to 41 more namespaces", and then find that it gets ENOSPC.

I also don't really want to set the limit higher and then reset it back down to 1 (as would happen if the #if 0 block is uncommented), because if I do that, it looks like a security-significant action even though actually it's not. Also, it'll look misleading in strace or similar.

And, I think it's possible to enter the opt_assert_userns_disabled block even if we're setuid root, which means we need to be careful not to do anything that could be a net user privilege increase (like increasing the user.max_user_namespaces of the init namespace).

@alexlarsson
Copy link
Collaborator

I really like this approach, although I feel that there are possible issues with the way flatpak relies on the 2 nested user namespaces, with the --userns2 argument in the parent_expose_pids || parent_share_pids support. In this case we would have now three user namespaces, and we'd enter the 3rd one. Would we need a --userns3?

I'm not sure if that is required, but also I don't really see a reason for having 3 user namespaces in this case. Can we not reuse the already existing unshare (CLONE_NEWUSER), possibly entering that codepath when we did not before by adding a || opt_disable_userns to the if?

@smcv
Copy link
Collaborator Author

smcv commented Aug 16, 2022

there are possible issues with the way flatpak relies on the 2 nested user namespaces, with the --userns2 argument in the parent_expose_pids || parent_share_pids support. In this case we would have now three user namespaces, and we'd enter the 3rd one

That's a good point, I'm not sure how this change would interact with that.

@smcv smcv marked this pull request as draft August 16, 2022 11:25
@alexlarsson alexlarsson force-pushed the disable-userns-via-limit branch from 92ebb9a to 1205054 Compare September 6, 2022 07:33
@alexlarsson
Copy link
Collaborator

I updated this to do the limits inside the code that already created the second userns. I believe that will be enough to avoid the issues with flatpak i mentioned. I'll do some testing with that.

I also added an "incompat error" for disable-userns and --userns-block-fd, because that doesn't seem to work together.

@alexlarsson
Copy link
Collaborator

So, playing with this in flatpak, and there is a general issue with the "expose-pid" and "share-pid" sandbox options in that they pass --userns {parent-pid} (as well as ---userns2) which is incompatible with this form of --disable-userns. However, I guess that isn't really a problem as we will be putting the sandboxed thing in a userns that has the right limits anyway. I'll verify this works by temporarily disabling the seccomp-based namespace limits.

@alexlarsson
Copy link
Collaborator

So, testing this, and it seems to work. However, when testing writes to /proc/sys/user/max_user_namespaces I always get permission errors unless I also add --cap-add all. So I guess in practice, in many cases we don't need the recursive namespaces. It doesn't hurt to keep it though.

alexlarsson added a commit to alexlarsson/flatpak that referenced this pull request Sep 6, 2022
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

This doesn't work with a setuid bubblewrap, so if you're using that
you now have to build flatpak with --with-priv-mode=setuid, even when
using the system bubblewrap.
@alexlarsson
Copy link
Collaborator

I made a WIP flatpak PR to use this.

@smcv Can you review this to make sure it looks ok to you. Maybe test with e.g. the steam sandbox stuff.

smcv added a commit to alexlarsson/flatpak that referenced this pull request Dec 16, 2022
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

[smcv: Move this to flatpak_run_setup_base_argv() so it will apply
equally in apply_extra_data() and `flatpak build`; make the compile-time
check for a setuid bwrap into a runtime check]

Co-authored-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
@smcv smcv force-pushed the disable-userns-via-limit branch 2 times, most recently from 12576c2 to f25cb4b Compare December 16, 2022 18:27
smcv added a commit to alexlarsson/flatpak that referenced this pull request Dec 16, 2022
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

[smcv: Move this to flatpak_run_setup_base_argv() so it will apply
equally in apply_extra_data() and `flatpak build`; make the compile-time
check for a setuid bwrap into a runtime check]

Co-authored-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
@smcv smcv force-pushed the disable-userns-via-limit branch from f25cb4b to 962d245 Compare December 16, 2022 18:33
smcv added a commit to alexlarsson/flatpak that referenced this pull request Dec 16, 2022
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

[smcv: Move this to flatpak_run_setup_base_argv() so it will apply
equally in apply_extra_data() and `flatpak build`; make the compile-time
check for a setuid bwrap into a runtime check]

Co-authored-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
smcv and others added 2 commits December 16, 2022 18:45
Some use-cases of bubblewrap want to ensure that the subprocess can't
further re-arrange the filesystem namespace, or do other more complex
namespace modification. For example, Flatpak wants to prevent sandboxed
processes from altering their /proc/$pid/root/.flatpak-info, so that
/.flatpak-info can safely be used as an indicator that a process is part
of a Flatpak app.

This approach was suggested by lukts30 on containers#452.
The sysctl-controlled maximum numbers of namespaces are themselves
namespaced, so we can disable nested user namespaces by setting the
limit to 1 and then entering a new, nested user namespace. The resulting
process loses its privileges in the namespace where the limit was set
to 1, so it is unable to move the limit back up.

Co-authored-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
We can't combine --disable-userns with entering an existing user
namespace via --userns if the existing user namespace was created with
--disable-userns, because its ability to create nested user namespaces
has already been disabled. However, the next best thing is to verify
that we are already in the desired state.

Signed-off-by: Simon McVittie <smcv@collabora.com>
@smcv smcv force-pushed the disable-userns-via-limit branch from 962d245 to 11d5339 Compare December 16, 2022 18:46
smcv added a commit to alexlarsson/flatpak that referenced this pull request Dec 16, 2022
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

[smcv: Move this to flatpak_run_setup_base_argv() so it will apply
equally in apply_extra_data() and `flatpak build`; make the compile-time
check for a setuid bwrap into a runtime check]

Co-authored-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
Comment on lines +3176 to +3178
if (opt_unshare_user && opt_userns_block_fd == -1 &&
(ns_uid != opt_sandbox_uid || ns_gid != opt_sandbox_gid ||
opt_disable_userns))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that as written here, we can get into this block even if we're setuid root. Please check that this isn't a security vulnerability?

I think it's OK, because we could already have entered this block anyway (for --unshare-user --sandbox-uid=42) even with a setuid bwrap, and the only thing we're doing here is reducing the max_user_namespaces in a new namespace, never increasing it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this is ever a problem. We only ever hit this if --unshare-user is set, which means at this point we did CLONE_USERNS. So, even if we had privileged capabilities on the host we now don't have them in the user namespace:

On the other hand, that process has no capabilities in the parent (in the case of clone(2)) or previous (in the case of [unshare(2)] (https://man7.org/linux/man-pages/man2/unshare.2.html) and setns(2)) user namespace, even if the new namespace is created or joined by the root user (i.e., a process with user ID 0 in the root namespace).

I.e. even in the setuid case we have at this point the same privileges as the non-setuid case.

Comment on lines +3201 to +3202
if (write_to_fd (sysctl_fd, "1", 1) < 0)
die_with_error ("sysctl user.max_user_namespaces = 1");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this has to be exactly 1. If it was 0, we wouldn't be able to unshare (CLONE_NEWUSER) immediately below, and if it was 2, then this protection would be ineffective.

@@ -3174,6 +3213,14 @@ main (int argc,
-1, FALSE, FALSE);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex's version had this write_uid_gid_map() call conditional on ns_uid != opt_sandbox_uid || ns_gid != opt_sandbox_gid, but I'm not 100% sure why, and I found that keeping the conditional broke my sandboxed process's ability to connect to D-Bus - I think because my uid wasn't mapped into the container correctly. It's less diff like this, anyway.

Copy link
Collaborator

@alexlarsson alexlarsson Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did that because previously in that case we would not hit this codepath so I would avoid doing what we didn't do before. But we just set up a new user namespace, so we have to do this here. So, your change is correct.

bubblewrap.c Outdated
Comment on lines 3216 to 3223
if (opt_disable_userns || opt_assert_userns_disabled)
{
/* Verify that we can't make a new userns again */
res = unshare (CLONE_NEWUSER);
if (res == 0)
die ("unable to disable creation of new user namespaces");
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the other edits, I think we can get here even in a setuid-root bwrap, so we need to be sure that this is not a security vulnerability. But I don't see how it could be: we're trying to create a new userns containing only this process, and we die if that operation succeeded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is slightly different, because in the setuid case we can hit it without having unshared users, so we still have permissions in the host user namespace. However, I agree that this code should be safe to run.

@smcv
Copy link
Collaborator Author

smcv commented Dec 16, 2022

Maybe test with e.g. the steam sandbox stuff.

Yes, it works. With flatpak/flatpak#5084 at commit e5c7e25, which includes this branch at 11d5339, I can run Steam Flatpak, and also run a game in the sub-sandbox.

@smcv smcv marked this pull request as ready for review December 16, 2022 18:56
@alexlarsson
Copy link
Collaborator

This lgtm.

@alexlarsson alexlarsson merged commit b5f6723 into containers:main Jan 3, 2023
smcv added a commit to alexlarsson/flatpak that referenced this pull request Feb 27, 2023
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

[smcv: Move this to flatpak_run_setup_base_argv() so it will apply
equally in apply_extra_data() and `flatpak build`; make the compile-time
check for a setuid bwrap into a runtime check]

Co-authored-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
smcv added a commit to alexlarsson/flatpak that referenced this pull request Feb 27, 2023
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

[smcv: Move this to flatpak_run_setup_base_argv() so it will apply
equally in apply_extra_data() and `flatpak build`; make the compile-time
check for a setuid bwrap into a runtime check]

Co-authored-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
smcv added a commit to alexlarsson/flatpak that referenced this pull request Feb 27, 2023
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

[smcv: Move this to flatpak_run_setup_base_argv() so it will apply
equally in apply_extra_data() and `flatpak build`; make the compile-time
check for a setuid bwrap into a runtime check]

Co-authored-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
smcv added a commit to alexlarsson/flatpak that referenced this pull request Mar 20, 2023
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

[smcv: Move this to flatpak_run_setup_base_argv() so it will apply
equally in apply_extra_data() and `flatpak build`; make the compile-time
check for a setuid bwrap into a runtime check]

Co-authored-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
smcv added a commit to alexlarsson/flatpak that referenced this pull request Mar 23, 2023
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

[smcv: Move this to flatpak_run_setup_base_argv() so it will apply
equally in apply_extra_data() and `flatpak build`; make the compile-time
check for a setuid bwrap into a runtime check]

Co-authored-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
smcv added a commit to flatpak/flatpak that referenced this pull request Mar 24, 2023
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

[smcv: Move this to flatpak_run_setup_base_argv() so it will apply
equally in apply_extra_data() and `flatpak build`; make the compile-time
check for a setuid bwrap into a runtime check]

Co-authored-by: Simon McVittie <smcv@collabora.com>
Signed-off-by: Simon McVittie <smcv@collabora.com>
smcv added a commit that referenced this pull request May 4, 2023
Fix test failure since #488 when running as uid 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants