-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to disable nested user namespaces by setting limit to 1 #488
Conversation
This still needs an automated test, and needs testing on a system where setuid bwrap is required (e.g. Debian 10). |
492f636
to
a26815e
Compare
Note that this would also block exploiting vulnerabilities like CVE-2022-34918 without relying on some form of syscall filtering. |
@lukts30 there is at least one such vulnerability almost every month, I have an nonextensive list at netblue30/firejail#4939 (comment). |
That's almost exactly my motivation for this: at the moment, Flatpak has seccomp-based syscall filtering as an essential part of its security model (in order to stop container payloads from entering new user namespaces, which it has to prevent because that would subvert its idea of identity), and I want to be able to stop doing that, at least for some apps. I primarily want this because we can get stronger guarantees in user-space if entering a new userns is prevented, and you primarily want this in order to reduce kernel attack surface, but the same mechanism helps both. |
I added one.
On such systems, this feature doesn't work, because the unprivileged user doesn't have permission to set the maximum number of namespaces. If Flatpak (or another bubblewrap user) uses this feature, it will have to do it conditionally, similar to how it already handles various other things that don't work while setuid. This class of systems is increasingly obsolete, so I think this is reasonable. However, on such systems, the unprivileged user can't create new user namespaces anyway - they can't do it themselves, because the kernel configuration won't allow it, and they also can't do it via the setuid bwrap, because |
a26815e
to
92ebb9a
Compare
bubblewrap.c
Outdated
if (unshare (CLONE_NEWUSER)) | ||
die_with_error ("unshare user ns"); | ||
|
||
/* Check that we can't make a new userns again */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well sysctl user.max_user_namespaces = 2
and then try to create the user ns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well sysctl user.max_user_namespaces = 2 and then try to create the user ns
This seems like it would be misleading, because setting the sysctl to a higher value succeeds and the unshare still fails, so I think that would give a reader the wrong idea about why this works. To verify this, I patched bubblewrap.c
like so:
diff --git a/bubblewrap.c b/bubblewrap.c
index cd57e86..8e21fc2 100644
--- a/bubblewrap.c
+++ b/bubblewrap.c
@@ -3200,6 +3200,8 @@ main (int argc,
if (write_to_fd (sysctl_fd, "1", 1) < 0)
die_with_error ("sysctl user.max_user_namespaces = 1");
+
+ fprintf (stderr, "max userns set to 1\n");
}
if (unshare (CLONE_NEWUSER))
@@ -3216,10 +3218,31 @@ main (int argc,
if (opt_disable_userns || opt_assert_userns_disabled)
{
+ cleanup_fd int sysctl_fd = -1;
+
+ sysctl_fd = openat (proc_fd, "sys/user/max_user_namespaces", O_WRONLY);
+
+ if (sysctl_fd < 0)
+ die_with_error ("cannot open /proc/sys/user/max_user_namespaces");
+
+ if (write_to_fd (sysctl_fd, "42", 2) < 0)
+ die_with_error ("sysctl user.max_user_namespaces = 42");
+
+ fprintf (stderr, "max userns set to 42\n");
+
/* Verify that we can't make a new userns again */
res = unshare (CLONE_NEWUSER);
if (res == 0)
die ("unable to disable creation of new user namespaces");
+ fprintf (stderr, "unshare(CLONE_NEWUSER) failed as expected, with: %s\n", strerror (errno));
+
+#if 0
+ (void) lseek (sysctl_fd, 0, SEEK_SET);
+ if (write_to_fd (sysctl_fd, "1", 1) < 0)
+ die_with_error ("sysctl user.max_user_namespaces = 1");
+
+ fprintf (stderr, "max userns back to 1\n");
+#endif
}
/* All privileged ops are done now, so drop caps we don't need */
With or without the block that's in #if 0
here, everything here succeeds, except for the unshare()
which fails with ENOSPC
as we would hope. What is happening here (as I understand it) is that the bubblewrap process has capabilities in the userns that it is creating or joining (because we haven't called drop_privs()
yet), like so:
- initns, owned by uid 0, limit set to 10000 or something
- intermediate ns, owned by us, limit set to 1
- final ns, owned by us, limit set to 42
- intermediate ns, owned by us, limit set to 1
and so we are allowed to set the limit to anything we want to, even if it's greater than the parent namespace's limit. However (as documented in namespaces(7)
), when we try to create a new userns, the resulting number of user namespaces is checked against "the corresponding namespace limit for the creator UID in the ancestor namespace", and if that limit would be exceeded then creation fails.
So the user.max_user_namespaces
in the final namespace is actually largely irrelevant, and it's the user.max_user_namespaces
in the intermediate namespace (in which members of the final namespace are incapable) which matters.
I don't really want to set the limit to a higher value for this check and then leave it there, because it seems unexpected that a process could read out user.max_user_namespaces
, think "oh that's fine, I'm probably allowed to create up to 41 more namespaces", and then find that it gets ENOSPC
.
I also don't really want to set the limit higher and then reset it back down to 1 (as would happen if the #if 0
block is uncommented), because if I do that, it looks like a security-significant action even though actually it's not. Also, it'll look misleading in strace
or similar.
And, I think it's possible to enter the opt_assert_userns_disabled
block even if we're setuid root, which means we need to be careful not to do anything that could be a net user privilege increase (like increasing the user.max_user_namespaces
of the init namespace).
I really like this approach, although I feel that there are possible issues with the way flatpak relies on the 2 nested user namespaces, with the I'm not sure if that is required, but also I don't really see a reason for having 3 user namespaces in this case. Can we not reuse the already existing |
That's a good point, I'm not sure how this change would interact with that. |
92ebb9a
to
1205054
Compare
I updated this to do the limits inside the code that already created the second userns. I believe that will be enough to avoid the issues with flatpak i mentioned. I'll do some testing with that. I also added an "incompat error" for disable-userns and --userns-block-fd, because that doesn't seem to work together. |
So, playing with this in flatpak, and there is a general issue with the "expose-pid" and "share-pid" sandbox options in that they pass |
So, testing this, and it seems to work. However, when testing writes to /proc/sys/user/max_user_namespaces I always get permission errors unless I also add |
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. This doesn't work with a setuid bubblewrap, so if you're using that you now have to build flatpak with --with-priv-mode=setuid, even when using the system bubblewrap.
I made a WIP flatpak PR to use this. @smcv Can you review this to make sure it looks ok to you. Maybe test with e.g. the steam sandbox stuff. |
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>
12576c2
to
f25cb4b
Compare
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>
f25cb4b
to
962d245
Compare
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>
Some use-cases of bubblewrap want to ensure that the subprocess can't further re-arrange the filesystem namespace, or do other more complex namespace modification. For example, Flatpak wants to prevent sandboxed processes from altering their /proc/$pid/root/.flatpak-info, so that /.flatpak-info can safely be used as an indicator that a process is part of a Flatpak app. This approach was suggested by lukts30 on containers#452. The sysctl-controlled maximum numbers of namespaces are themselves namespaced, so we can disable nested user namespaces by setting the limit to 1 and then entering a new, nested user namespace. The resulting process loses its privileges in the namespace where the limit was set to 1, so it is unable to move the limit back up. Co-authored-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Simon McVittie <smcv@collabora.com>
We can't combine --disable-userns with entering an existing user namespace via --userns if the existing user namespace was created with --disable-userns, because its ability to create nested user namespaces has already been disabled. However, the next best thing is to verify that we are already in the desired state. Signed-off-by: Simon McVittie <smcv@collabora.com>
962d245
to
11d5339
Compare
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>
if (opt_unshare_user && opt_userns_block_fd == -1 && | ||
(ns_uid != opt_sandbox_uid || ns_gid != opt_sandbox_gid || | ||
opt_disable_userns)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that as written here, we can get into this block even if we're setuid root. Please check that this isn't a security vulnerability?
I think it's OK, because we could already have entered this block anyway (for --unshare-user --sandbox-uid=42
) even with a setuid bwrap, and the only thing we're doing here is reducing the max_user_namespaces
in a new namespace, never increasing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this is ever a problem. We only ever hit this if --unshare-user is set, which means at this point we did CLONE_USERNS. So, even if we had privileged capabilities on the host we now don't have them in the user namespace:
On the other hand, that process has no capabilities in the parent (in the case of clone(2)) or previous (in the case of [unshare(2)] (https://man7.org/linux/man-pages/man2/unshare.2.html) and setns(2)) user namespace, even if the new namespace is created or joined by the root user (i.e., a process with user ID 0 in the root namespace).
I.e. even in the setuid case we have at this point the same privileges as the non-setuid case.
if (write_to_fd (sysctl_fd, "1", 1) < 0) | ||
die_with_error ("sysctl user.max_user_namespaces = 1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this has to be exactly 1. If it was 0, we wouldn't be able to unshare (CLONE_NEWUSER)
immediately below, and if it was 2, then this protection would be ineffective.
@@ -3174,6 +3213,14 @@ main (int argc, | |||
-1, FALSE, FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alex's version had this write_uid_gid_map()
call conditional on ns_uid != opt_sandbox_uid || ns_gid != opt_sandbox_gid
, but I'm not 100% sure why, and I found that keeping the conditional broke my sandboxed process's ability to connect to D-Bus - I think because my uid wasn't mapped into the container correctly. It's less diff like this, anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did that because previously in that case we would not hit this codepath so I would avoid doing what we didn't do before. But we just set up a new user namespace, so we have to do this here. So, your change is correct.
bubblewrap.c
Outdated
if (opt_disable_userns || opt_assert_userns_disabled) | ||
{ | ||
/* Verify that we can't make a new userns again */ | ||
res = unshare (CLONE_NEWUSER); | ||
if (res == 0) | ||
die ("unable to disable creation of new user namespaces"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the other edits, I think we can get here even in a setuid-root bwrap, so we need to be sure that this is not a security vulnerability. But I don't see how it could be: we're trying to create a new userns containing only this process, and we die if that operation succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is slightly different, because in the setuid case we can hit it without having unshared users, so we still have permissions in the host user namespace. However, I agree that this code should be safe to run.
Yes, it works. With flatpak/flatpak#5084 at commit e5c7e25, which includes this branch at 11d5339, I can run Steam Flatpak, and also run a game in the sub-sandbox. |
This lgtm. |
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <smcv@collabora.com> Signed-off-by: Simon McVittie <smcv@collabora.com>
Fix test failure since #488 when running as uid 0
Add an option to disable nested user namespaces by setting limit to 1
Some use-cases of bubblewrap want to ensure that the subprocess can't
further re-arrange the filesystem namespace, or do other more complex
namespace modification. For example, Flatpak wants to prevent sandboxed
processes from altering their /proc/$pid/root/.flatpak-info, so that
/.flatpak-info can safely be used as an indicator that a process is part
of a Flatpak app.
This approach was suggested by lukts30 on Add --disable-userns switch #452.
The sysctl-controlled maximum numbers of namespaces are themselves
namespaced, so we can disable nested user namespaces by setting the
limit to 1 and then entering a new, nested user namespace. The resulting
process loses its privileges in the namespace where the limit was set
to 1, so it is unable to move the limit back up.
Co-authored-by: Alexander Larsson
Add --assert-userns-disabled option
We can't combine --disable-userns with entering an existing user
namespace via --userns if the existing user namespace was created with
--disable-userns, because its ability to create nested user namespaces
has already been disabled. However, the next best thing is to verify
that we are already in the desired state.