Skip to content

Commit 6400238

Browse files
author
Christian Brauner
committed
CVE-2019-5736 (runC): rexec callers as memfd
Adam Iwaniuk and Borys Popławski discovered that an attacker can compromise the runC host binary from inside a privileged runC container. As a result, this could be exploited to gain root access on the host. runC is used as the default runtime for containers with Docker, containerd, Podman, and CRI-O. The attack can be made when attaching to a running container or when starting a container running a specially crafted image. For example, when runC attaches to a container the attacker can trick it into executing itself. This could be done by replacing the target binary inside the container with a custom binary pointing back at the runC binary itself. As an example, if the target binary was /bin/bash, this could be replaced with an executable script specifying the interpreter path #!/proc/self/exe (/proc/self/exec is a symbolic link created by the kernel for every process which points to the binary that was executed for that process). As such when /bin/bash is executed inside the container, instead the target of /proc/self/exe will be executed - which will point to the runc binary on the host. The attacker can then proceed to write to the target of /proc/self/exe to try and overwrite the runC binary on the host. However in general, this will not succeed as the kernel will not permit it to be overwritten whilst runC is executing. To overcome this, the attacker can instead open a file descriptor to /proc/self/exe using the O_PATH flag and then proceed to reopen the binary as O_WRONLY through /proc/self/fd/<nr> and try to write to it in a busy loop from a separate process. Ultimately it will succeed when the runC binary exits. After this the runC binary is compromised and can be used to attack other containers or the host itself. This attack is only possible with privileged containers since it requires root privilege on the host to overwrite the runC binary. Unprivileged containers with a non-identity ID mapping do not have the permission to write to the host binary and therefore are unaffected by this attack. LXC is also impacted in a similar manner by this vulnerability, however as the LXC project considers privileged containers to be unsafe no CVE has been assigned for this issue for LXC. Quoting from the https://linuxcontainers.org/lxc/security/ project's Security information page: "As privileged containers are considered unsafe, we typically will not consider new container escape exploits to be security issues worthy of a CVE and quick fix. We will however try to mitigate those issues so that accidental damage to the host is prevented." To prevent this attack, LXC has been patched to create a temporary copy of the calling binary itself when it starts or attaches to containers. To do this LXC creates an anonymous, in-memory file using the memfd_create() system call and copies itself into the temporary in-memory file, which is then sealed to prevent further modifications. LXC then executes this sealed, in-memory file instead of the original on-disk binary. Any compromising write operations from a privileged container to the host LXC binary will then write to the temporary in-memory binary and not to the host binary on-disk, preserving the integrity of the host LXC binary. Also as the temporary, in-memory LXC binary is sealed, writes to this will also fail. Note: memfd_create() was added to the Linux kernel in the 3.17 release. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Co-Developed-by: Alesa Sarai <asarai@suse.de> Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
1 parent 9913ac1 commit 6400238

File tree

6 files changed

+252
-1
lines changed

6 files changed

+252
-1
lines changed

configure.ac

+12
Original file line numberDiff line numberDiff line change
@@ -746,6 +746,17 @@ AM_COND_IF([ENABLE_DLOG],
746746
])
747747
])
748748

749+
AC_ARG_ENABLE([memfd-rexec],
750+
[AC_HELP_STRING([--enable-memfd-rexec], [enforce liblxc as a memfd to protect against certain symlink attacks [default=yes]])],
751+
[], [enable_memfd_rexec=yes])
752+
AM_CONDITIONAL([ENFORCE_MEMFD_REXEC], [test "x$enable_memfd_rexec" = "xyes"])
753+
if test "x$enable_memfd_rexec" = "xyes"; then
754+
AC_DEFINE([ENFORCE_MEMFD_REXEC], 1, [Rexec liblxc as memfd])
755+
AC_MSG_RESULT([yes])
756+
else
757+
AC_MSG_RESULT([no])
758+
fi
759+
749760
# Files requiring some variable expansion
750761
AC_CONFIG_FILES([
751762
Makefile
@@ -974,6 +985,7 @@ Security features:
974985
- Linux capabilities: $enable_capabilities
975986
- seccomp: $enable_seccomp
976987
- SELinux: $enable_selinux
988+
- memfd rexec: $enable_memfd_rexec
977989

978990
PAM:
979991
- PAM module: $enable_pam

src/lxc/Makefile.am

+4
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,10 @@ if !HAVE_STRLCAT
177177
liblxc_la_SOURCES += ../include/strlcat.c ../include/strlcat.h
178178
endif
179179

180+
if ENFORCE_MEMFD_REXEC
181+
liblxc_la_SOURCES += rexec.c
182+
endif
183+
180184
AM_CFLAGS = -DLXCROOTFSMOUNT=\"$(LXCROOTFSMOUNT)\" \
181185
-DLXCPATH=\"$(LXCPATH)\" \
182186
-DLXC_GLOBAL_CONF=\"$(LXC_GLOBAL_CONF)\" \

src/lxc/file_utils.c

+40-1
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
#include "config.h"
3232
#include "file_utils.h"
3333
#include "macro.h"
34-
#include "string.h"
34+
#include "string_utils.h"
3535

3636
int lxc_write_to_file(const char *filename, const void *buf, size_t count,
3737
bool add_newline, mode_t mode)
@@ -327,3 +327,42 @@ ssize_t lxc_sendfile_nointr(int out_fd, int in_fd, off_t *offset, size_t count)
327327

328328
return ret;
329329
}
330+
331+
char *file_to_buf(char *path, size_t *length)
332+
{
333+
int fd;
334+
char buf[PATH_MAX];
335+
char *copy = NULL;
336+
337+
if (!length)
338+
return NULL;
339+
340+
fd = open(path, O_RDONLY | O_CLOEXEC);
341+
if (fd < 0)
342+
return NULL;
343+
344+
*length = 0;
345+
for (;;) {
346+
int n;
347+
char *old = copy;
348+
349+
n = lxc_read_nointr(fd, buf, sizeof(buf));
350+
if (n < 0)
351+
goto on_error;
352+
if (!n)
353+
break;
354+
355+
copy = must_realloc(old, (*length + n) * sizeof(*old));
356+
memcpy(copy + *length, buf, n);
357+
*length += n;
358+
}
359+
360+
close(fd);
361+
return copy;
362+
363+
on_error:
364+
close(fd);
365+
free(copy);
366+
367+
return NULL;
368+
}

src/lxc/file_utils.h

+1
Original file line numberDiff line numberDiff line change
@@ -55,5 +55,6 @@ extern bool is_fs_type(const struct statfs *fs, fs_type_magic magic_val);
5555
extern FILE *fopen_cloexec(const char *path, const char *mode);
5656
extern ssize_t lxc_sendfile_nointr(int out_fd, int in_fd, off_t *offset,
5757
size_t count);
58+
extern char *file_to_buf(char *path, size_t *length);
5859

5960
#endif /* __LXC_FILE_UTILS_H */

src/lxc/rexec.c

+181
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
/* liblxcapi
2+
*
3+
* Copyright © 2019 Christian Brauner <christian.brauner@ubuntu.com>.
4+
* Copyright © 2019 Canonical Ltd.
5+
*
6+
* This program is free software; you can redistribute it and/or modify
7+
* it under the terms of the GNU General Public License version 2, as
8+
* published by the Free Software Foundation.
9+
*
10+
* This program is distributed in the hope that it will be useful,
11+
* but WITHOUT ANY WARRANTY; without even the implied warranty of
12+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13+
* GNU General Public License for more details.
14+
*
15+
* You should have received a copy of the GNU General Public License along
16+
* with this program; if not, write to the Free Software Foundation, Inc.,
17+
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
18+
*/
19+
20+
#ifndef _GNU_SOURCE
21+
#define _GNU_SOURCE 1
22+
#endif
23+
#include <errno.h>
24+
#include <stdio.h>
25+
#include <stdlib.h>
26+
#include <string.h>
27+
28+
#include "config.h"
29+
#include "file_utils.h"
30+
#include "raw_syscalls.h"
31+
#include "string_utils.h"
32+
#include "syscall_wrappers.h"
33+
34+
#define LXC_MEMFD_REXEC_SEALS \
35+
(F_SEAL_SEAL | F_SEAL_SHRINK | F_SEAL_GROW | F_SEAL_WRITE)
36+
37+
static int push_vargs(char *data, int data_length, char ***output)
38+
{
39+
int num = 0;
40+
char *cur = data;
41+
42+
if (!data || *output)
43+
return -1;
44+
45+
*output = must_realloc(NULL, sizeof(**output));
46+
47+
while (cur < data + data_length) {
48+
num++;
49+
*output = must_realloc(*output, (num + 1) * sizeof(**output));
50+
51+
(*output)[num - 1] = cur;
52+
cur += strlen(cur) + 1;
53+
}
54+
(*output)[num] = NULL;
55+
return num;
56+
}
57+
58+
static int parse_exec_params(char ***argv, char ***envp)
59+
{
60+
int ret;
61+
char *cmdline = NULL, *env = NULL;
62+
size_t cmdline_size, env_size;
63+
64+
cmdline = file_to_buf("/proc/self/cmdline", &cmdline_size);
65+
if (!cmdline)
66+
goto on_error;
67+
68+
env = file_to_buf("/proc/self/environ", &env_size);
69+
if (!env)
70+
goto on_error;
71+
72+
ret = push_vargs(cmdline, cmdline_size, argv);
73+
if (ret <= 0)
74+
goto on_error;
75+
76+
ret = push_vargs(env, env_size, envp);
77+
if (ret <= 0)
78+
goto on_error;
79+
80+
return 0;
81+
82+
on_error:
83+
free(env);
84+
free(cmdline);
85+
86+
return -1;
87+
}
88+
89+
static int is_memfd(void)
90+
{
91+
int fd, saved_errno, seals;
92+
93+
fd = open("/proc/self/exe", O_RDONLY | O_CLOEXEC);
94+
if (fd < 0)
95+
return -ENOTRECOVERABLE;
96+
97+
seals = fcntl(fd, F_GET_SEALS);
98+
saved_errno = errno;
99+
close(fd);
100+
errno = saved_errno;
101+
if (seals < 0)
102+
return -EINVAL;
103+
104+
return seals == LXC_MEMFD_REXEC_SEALS;
105+
}
106+
107+
static void lxc_rexec_as_memfd(char **argv, char **envp, const char *memfd_name)
108+
{
109+
int saved_errno;
110+
ssize_t bytes_sent;
111+
int fd = -1, memfd = -1;
112+
113+
memfd = memfd_create(memfd_name, MFD_ALLOW_SEALING | MFD_CLOEXEC);
114+
if (memfd < 0)
115+
return;
116+
117+
fd = open("/proc/self/exe", O_RDONLY | O_CLOEXEC);
118+
if (fd < 0)
119+
goto on_error;
120+
121+
/* sendfile() handles up to 2GB. */
122+
bytes_sent = lxc_sendfile_nointr(memfd, fd, NULL, LXC_SENDFILE_MAX);
123+
saved_errno = errno;
124+
close(fd);
125+
errno = saved_errno;
126+
if (bytes_sent < 0)
127+
goto on_error;
128+
129+
if (fcntl(memfd, F_ADD_SEALS, LXC_MEMFD_REXEC_SEALS))
130+
goto on_error;
131+
132+
fexecve(memfd, argv, envp);
133+
134+
on_error:
135+
saved_errno = errno;
136+
close(memfd);
137+
errno = saved_errno;
138+
}
139+
140+
static int lxc_rexec(const char *memfd_name)
141+
{
142+
int ret;
143+
char **argv = NULL, **envp = NULL;
144+
145+
ret = is_memfd();
146+
if (ret < 0 && ret == -ENOTRECOVERABLE) {
147+
fprintf(stderr,
148+
"%s - Failed to determine whether this is a memfd\n",
149+
strerror(errno));
150+
return -1;
151+
} else if (ret > 0) {
152+
return 0;
153+
}
154+
155+
ret = parse_exec_params(&argv, &envp);
156+
if (ret < 0) {
157+
fprintf(stderr,
158+
"%s - Failed to parse command line parameters\n",
159+
strerror(errno));
160+
return -1;
161+
}
162+
163+
lxc_rexec_as_memfd(argv, envp, memfd_name);
164+
fprintf(stderr, "%s - Failed to rexec as memfd\n", strerror(errno));
165+
return -1;
166+
}
167+
168+
/**
169+
* This function will copy any binary that calls liblxc into a memory file and
170+
* will use the memfd to rexecute the binary. This is done to prevent attacks
171+
* through the /proc/self/exe symlink to corrupt the host binary when host and
172+
* container are in the same user namespace or have set up an identity id
173+
* mapping: CVE-2019-5736.
174+
*/
175+
__attribute__((constructor)) static void liblxc_rexec(void)
176+
{
177+
if (lxc_rexec("liblxc")) {
178+
fprintf(stderr, "Failed to re-execute liblxc via memory file descriptor\n");
179+
_exit(EXIT_FAILURE);
180+
}
181+
}

src/lxc/syscall_wrappers.h

+14
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,20 @@ static inline long __keyctl(int cmd, unsigned long arg2, unsigned long arg3,
5858
#define keyctl __keyctl
5959
#endif
6060

61+
#ifndef F_LINUX_SPECIFIC_BASE
62+
#define F_LINUX_SPECIFIC_BASE 1024
63+
#endif
64+
#ifndef F_ADD_SEALS
65+
#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
66+
#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
67+
#endif
68+
#ifndef F_SEAL_SEAL
69+
#define F_SEAL_SEAL 0x0001
70+
#define F_SEAL_SHRINK 0x0002
71+
#define F_SEAL_GROW 0x0004
72+
#define F_SEAL_WRITE 0x0008
73+
#endif
74+
6175
#ifndef HAVE_MEMFD_CREATE
6276
static inline int memfd_create(const char *name, unsigned int flags) {
6377
#ifndef __NR_memfd_create

0 commit comments

Comments
 (0)