-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Use runc-dmz to defeat CVE-2019-5736 #3983
Conversation
Makefile
Outdated
runc: runc-dmz runc-bin | ||
|
||
runc-dmz: | ||
gcc -o runc-dmz -static contrib/cmd/runc-dmz/runc-dmz.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: $(CC) instead of directly calling gcc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, as it will particularly ease the cross compilation.
Signed-off-by: lifubang <lifubang@acmcoder.com>
Signed-off-by: lifubang <lifubang@acmcoder.com>
Signed-off-by: lifubang <lifubang@acmcoder.com>
Signed-off-by: lifubang <lifubang@acmcoder.com>
CI failing
|
runc: runc-dmz runc-bin | ||
|
||
runc-dmz: | ||
$(CC) -o runc-dmz -static contrib/cmd/runc-dmz/runc-dmz.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And CFLAGS
runc: runc-dmz runc-bin | ||
|
||
runc-dmz: | ||
$(CC) -o runc-dmz -static contrib/cmd/runc-dmz/runc-dmz.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to make this a hard dependency, it should be placed out of “contrib”
return execve(args[0], args, environ); | ||
} | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can optimize the binary footprint by eliminating glibc and executing “syscall” instruction directly, but that may happen in follow-up PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we build this binary with musl, you get a 13KB binary. With glibc it's 1.1MB. We will need to adjust our build system for this -- but there is a separate issue that distributions might find building a single binary with musl quite difficult. Speaking from the SUSE side, I don't think we even package musl for SLES so we would need to build with glibc (meaning the binary would be 1.1MB in size -- almost 100 times larger).
As you say, this can be done in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewriting this in asm will be much easier than complicating libc dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if there is a silver bullet here:
- with
glibc
, you get a big binary. - with
musl
, you get a small binary but this can be painful to build againstmusl
from some distribution. - with pure assembly, you will need as many files as supported architecture and choosing which file to build depending on the target architecture.
Maybe using syscall
directly can be a middle ground, but you will need to defined the execve
syscall number depending on the target architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syscall
doesn't reduce the binary size unfortunately (also, libcs provide the syscall numbers for all syscalls in <sys/syscall.h>
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sad :(.
Let me know regarding the assembly files, I can give a hand for amd64
and arm64
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://gist.github.com/Zheaoli/f37bc1fb04917fdfac36d644ee69f7e9
I made a demo for amd64. The binary file size is 4.9k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my contribution for arm64
:
From 95defc03b7ec526c9c966735ebeb5bac5a9a34a0 Mon Sep 17 00:00:00 2001
From: Francis Laniel <flaniel@linux.microsoft.com>
Date: Wed, 30 Aug 2023 13:06:27 +0200
Subject: [PATCH] runc-dmz: Add arm64 assembly code.
Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
---
contrib/cmd/runc-dmz/runc-dmz-arm64.S | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
create mode 100644 contrib/cmd/runc-dmz/runc-dmz-arm64.S
diff --git a/contrib/cmd/runc-dmz/runc-dmz-arm64.S b/contrib/cmd/runc-dmz/runc-dmz-arm64.S
new file mode 100644
index 00000000..25387e61
--- /dev/null
+++ b/contrib/cmd/runc-dmz/runc-dmz-arm64.S
@@ -0,0 +1,25 @@
+.equ EXECVE_SYSCALL_NUMBER, 221
+.equ EXIT_SYSCALL_NUMBER, 93
+
+.text
+.global _start
+_start:
+ ldr x3, [sp] // *sp contains argc, so x3 = argc
+ cmp x3, #0
+ bgt execve
+ mov x0, #0
+
+exit:
+ mov x8, EXIT_SYSCALL_NUMBER
+ svc #0
+
+execve:
+ add x1, sp, #16 // x1 = &argv[1]
+ ldr x0, [x1] // x0 = argv[1]
+ # argv[argc] is NULL and environment starts at argv[argc + 1].
+ mov x4, #8 // x4 = 8
+ mul x3, x3, x4 // x3 = (argc + 1) * 8
+ add x2, x1, x3 // x3 = &argv[argc + 1], i.e. environment.
+ mov x8, EXECVE_SYSCALL_NUMBER
+ svc #0
+ b exit
--
2.34.1
I tested it both through emulation and virtualization:
francis@pwmachine:~/Codes/kinvolk/runc/contrib/cmd/runc-dmz$ aarch64-linux-gnu-as -g runc-dmz-arm64.S -o runc-dmz-arm64.o (95defc03...) %
francis@pwmachine:~/Codes/kinvolk/runc/contrib/cmd/runc-dmz$ aarch64-linux-gnu-ld -static runc-dmz-arm64.o (95defc03...) %
francis@pwmachine:~/Codes/kinvolk/runc/contrib/cmd/runc-dmz$ du -sh a.out (95defc03...) %
4,0K a.out
francis@pwmachine:~/Codes/kinvolk/runc/contrib/cmd/runc-dmz$ file a.out (95defc03...) %
a.out: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, with debug_info, not stripped
francis@pwmachine:~/Codes/kinvolk/runc/contrib/cmd/runc-dmz$ qemu-aarch64-static ./a.out /bin/ls -al (95defc03...) %
total 32
drwxrwxr-x 2 francis francis 4096 août 30 13:09 .
drwxrwxr-x 7 francis francis 4096 août 29 17:23 ..
-rw-rw-r-- 1 francis francis 1213 août 30 13:06 0001-runc-dmz-Add-arm64-assembly-code.patch
-rwxrwxr-x 1 francis francis 1832 août 30 13:09 a.out
-rw-rw-r-- 1 francis francis 2168 août 30 13:09 runc-dmz-arm64.o
-rw-rw-r-- 1 francis francis 524 août 30 13:06 runc-dmz-arm64.S
-rw-rw-r-- 1 francis francis 168 août 29 17:23 runc-dmz.c
-rw-rw-r-- 1 francis francis 700 août 29 18:59 runc-dmz.s
# Inside arm64 VM:
root@vm-arm64:~/share/kinvolk/runc/contrib/cmd/runc-dmz# lscpu | head -1
Architecture: aarch64
root@vm-arm64:~/share/kinvolk/runc/contrib/cmd/runc-dmz# ./a.out /bin/ls -al
total 32
drwxrwxr-x 2 1000 1000 4096 Aug 30 11:09 .
drwxrwxr-x 7 1000 1000 4096 Aug 29 15:23 ..
-rw-rw-r-- 1 1000 1000 1213 Aug 30 11:06 0001-runc-dmz-Add-arm64-assembly-code.patch
-rwxrwxr-x 1 1000 1000 1832 Aug 30 11:09 a.out
-rw-rw-r-- 1 1000 1000 524 Aug 30 11:06 runc-dmz-arm64.S
-rw-rw-r-- 1 1000 1000 2168 Aug 30 11:09 runc-dmz-arm64.o
-rw-rw-r-- 1 1000 1000 168 Aug 29 15:23 runc-dmz.c
-rw-rw-r-- 1 1000 1000 700 Aug 29 16:59 runc-dmz.s
Feel free to apply the patch! In any case, it was funny to write arm64 assembly from scratch as it was a long time I did not do that.
If you see any problem with the code, share your feedback and I will address your comments!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe there still have some design problems for runc-dmz
proposal.
Please see: #3987 (comment)
But I can't confirm, if you can check there is no problem for runc-dmz
propsal, it will be appreciated. Because I don't want to waste contributor's time for a wrong proposal.
panic(err) | ||
} | ||
nowPath := getExeDir() | ||
dmzMake, err := exec.Command("gcc", "-o", filepath.Join(nowPath, "integration.test-dmz"), "-static", filepath.Join(rootDir, "contrib/cmd/runc-dmz/runc-dmz.c")).CombinedOutput() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn’t be here, and the dmz binary should be just in the PATH?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just embed the dmz binary into the main binary
nowPath := getExeDir() | ||
dmzMake, err := exec.Command("gcc", "-o", filepath.Join(nowPath, "nsenter.test-dmz"), "-static", filepath.Join(rootDir, "contrib/cmd/runc-dmz/runc-dmz.c")).CombinedOutput() | ||
if err != nil { | ||
panic(fmt.Errorf("make runc-dmz error %w (output: %s)", err, dmzMake)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dmz binary should be just preinstalled in the PATH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just embed the dmz binary into the main binary
|
||
extern char **environ; | ||
|
||
int main(int argv, char **args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int main(int argv, char **args) | |
int main(int argc, char **argv) |
if (argv > 0) { | ||
return execve(args[0], args, environ); | ||
} | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this should be non-zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, though the binary should be embedded in the runc
binary using //go:embed
and (possibly in a future PR) all of this code should be moved to Go. I suspect embedding the binary text would make it far easier to do all of this in Go, so maybe the patch in #3953 doing that should be merged into this PR.
However, since the size of runc-dmz
under glibc is still 1.1MB, I need to ask whether this actually fixes the original issue -- this would increase the limit of waiting runc create
s to 10x the original (~3k). A musl-built binary is much smaller (13KB) and so you can run 1000x more runc create
s (but distributions might struggle to build musl binaries this way).
if err != nil { | ||
return err | ||
} | ||
dmzArgs := []string{entryPoint} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this variable isn't necessary, you can just do append([]string{...}, ...)
.
if err != nil { | ||
return err | ||
} | ||
dmzArgs := []string{entryPoint} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
return execve(args[0], args, environ); | ||
} | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we build this binary with musl, you get a 13KB binary. With glibc it's 1.1MB. We will need to adjust our build system for this -- but there is a separate issue that distributions might find building a single binary with musl quite difficult. Speaking from the SUSE side, I don't think we even package musl for SLES so we would need to build with glibc (meaning the binary would be 1.1MB in size -- almost 100 times larger).
As you say, this can be done in a future PR.
return cloned; | ||
|
||
if (fetchve(&argv) < 0) | ||
return -EINVAL; | ||
|
||
execfd = clone_binary(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binary text should be embedded into the runc
binary (using //go:embed
) and all of this code should be moved to the Go code (like I did in #3953). But if you like, I can do that in a follow-up PR (or I can take this code and the memfd code from #3953 and make a single PR to clean this all up in one go).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or I can take this code and the memfd code from #3953 and make a single PR to clean this all up in one go
Yes, I think you can take this if you think this proposal worth to try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll open a separate PR with this code and combine it with the cloned-binary patch. Thanks for working on this, I like the idea a lot!
int main(int argv, char **args) | ||
{ | ||
if (argv > 0) { | ||
return execve(args[0], args, environ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering about ensuring the first argument, i.e. the executed binary is runc
.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first argument is not runc
-- it's the container pid1 program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh OK! Thank you for the precision! I still need to polish my runc
knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first argument is not
runc
-- it's the container pid1 program.
If you try to run the C binary directly, I think the first argument of args would be runc-dmz
?
Like the idea! I tried compiling the C prog with dietlibc, ended up with a 9.5K static stripped binary. I'm not sure how distro vendors (who doesn't have musl or dietlibc) can package this effectively, but we need to find a way to help them all. |
I agree with @AkihiroSuda that asm versions for the common architectures is probably the simplest option -- but we can do that in a separate PR to the initial implementation. Most distributions won't be able to package pre-built binaries, so if building with a custom libc is difficult in their build system, they won't be able to do much to reduce the size of the C program. |
Maybe this is the big problem of this proposal. The capabilities of process inherit for Please see https://man7.org/linux/man-pages/man7/capabilities.7.html
|
Yeah, that's unfortunate. But we can't do the capability configuration in Instead, I think the test should have included In fact, the runtime-spec having the
EDIT 2: My first analysis is correct -- Docker has broken handling of capabilities for non-root users. Root has special handling in the kernel in
This behaviour is identical with One alternative solution is to set EDIT 3: Or, we could remove the ambient capabilities in runc-dmz. However this will increase the size of runc-dmz (we shouldn't call |
#3987 is a port of this to Go, along with quite a few other things improving the cloning logic. |
According to some security reasons, this proposal is not worth to implement. Thanks everybody’s efforts. |
To anyone who reads in here, this proplsal finally has taken in some conditions, please see #3987 (comment) . Thanks everyone! |
Close: #3973
For CVE-2019-5736, we have to make sure we can't write the original runc binary from a container.
So we use
memfd_create
to loadrunc
to the memory and seal it. But the size ofrunc
binary file is about 13M, if we concreate lots of containers in a short period, it will eat host's memory.We want to reduce the size of
runc
binary file, but it's very diffculty.There is another way, before the container's entrypoint start, if we can't see
runc
, then there is no need to protect runc. So we need to make runc exit before the container task start. The only way is to replace it with an other process, we call itrunc-dmz
, it means a temp zone betweenrunc
andcontainer
.The runc-dmz is very simple:
We can make it as a static binary, the size is 852kb.
I think 852kb is more smaller than 14mb, so it will save many memory when concreating lots of container at a time.
But I don't know whether it will cause some other problems.