-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the error of runc doesn't work with go1.22 #4193
Conversation
go 1.22.0 error msg:
|
This can be fixed by adding |
Interestingly, both runc 1.1.12 and runc from git HEAD built with go1.22.0 work fine on my machine (all tests are passing). |
This comment was marked as outdated.
This comment was marked as outdated.
We also need to fix this for Go 1.22
I don't remember why I haven't switched to |
It seems that cgo may be broken with clone(2) in go1.22.0? |
Again, I can't repro locally. [kir@kir-tp1 cgoclone2]$ go version
go version go1.21.6 linux/amd64
[kir@kir-tp1 cgoclone2]$ go run main.go
STAGE_PARENT
STAGE_CHILD
STAGE_INIT
This from nsexec
From main!
[kir@kir-tp1 cgoclone2]$ go1.22.0 version
go version go1.22.0 linux/amd64
[kir@kir-tp1 cgoclone2]$ go1.22.0 run main.go
STAGE_PARENT
STAGE_CHILD
STAGE_INIT
This from nsexec
From main! Maybe it's your kernel version @lifubang? Can you show |
@lifubang also if you can repro that (alas I can not), you can git bisect golang between 1.21.0 and 1.22.0. |
Note in CI it happens with Ubuntu 20.04 but not Ubuntu 22.04. Will try to repro in a VM. |
On Ubuntu 20.04, when running the binary compiled with go 1.22, I am seeing a SIGSEGV:
Can't yet figure out what's going on there; will continue tomorrow. |
@lifubang I did a bisect, here are the results: golang/go#65625 (comment) Will continue tomorrow. |
So, to summarize the investigation done there -- it's a glibc bug, in fact, two bugs:
These two bugs are apparently specific to glibc used by Ubuntu 20.04 (libc6 2.31-0ubuntu9.14) and maybe also Debian 10 (libc6 2.28-10+deb10u2), as I was able to reproduce on both. With Debian 10, it even prints error from free: For some reason I was unable to repro on older Fedora (F32, glibc-2.31-2.fc32, F33, glibc-2.32-10.fc33) and Debian 11 (libc6 2.31-7). The bad news is, every version of glibc has the bug 1 above, and https://go-review.googlesource.com/c/go/+/563379 may make it so go 1.22.x will fail runc init on every version of glibc. Meaning, we need a workaround for that. Perhaps changing runc libct/nsenter logic in some radical way, so that |
Go 1.22 currently causes crashes on older Debian/Ubuntu systems. lxc/incus#497 golang/go#65625 opencontainers/runc#4193 Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
Go 1.22 currently causes crashes on older Debian/Ubuntu systems. lxc/incus#497 golang/go#65625 opencontainers/runc#4193 Signed-off-by: Stéphane Graber <stgraber@stgraber.org>
👍 |
Rebasing this to re-run with Go 1.22.1 |
c54384f
to
5907889
Compare
Sorry @lifubang I've high-jacked your PR, needed to run it with Go 1.22.1 and added missing changes to |
OK, Go 1.22.1 makes no difference. I guess we have to disable Go 1.22 for now. |
3144923
to
aebd5b5
Compare
Do you think moving the c code in |
I think maybe no, because after clone(2), it has already in go routine, it can’t longjmp to C. So, it seems that there is no other way to fix this issue? |
7eac8b8
to
df04ed4
Compare
This patch also works, while still allowing us to use diff --git a/libcontainer/nsenter/nsexec.c b/libcontainer/nsenter/nsexec.c
index c771ac7e1165..319899bd9b71 100644
--- a/libcontainer/nsenter/nsexec.c
+++ b/libcontainer/nsenter/nsexec.c
@@ -15,6 +15,7 @@
#include <stdbool.h>
#include <string.h>
#include <unistd.h>
+#include <pthread.h> /* _only_ used for pthread_self() in debug log */
#include <sys/ioctl.h>
#include <sys/prctl.h>
@@ -319,7 +320,41 @@ static int clone_parent(jmp_buf *env, int jmpval)
.jmpval = jmpval,
};
- return clone(child_func, ca.stack_ptr, CLONE_PARENT | SIGCHLD, &ca);
+ /*
+ * Since glibc 2.25 (see c579f48edba88380635ab98cb612030e3ed8691e),
+ * glibc no longer updates the TLS state containing the current process
+ * tid after clone(2). This results in stale TIDs being used when Go
+ * 1.22 and later call pthread_gettattr_np(pthread_self()), resulting
+ * in crashes on ancient glibcs and errors on newer glibcs.
+ *
+ * Luckily, because the same address is used for CLONE_PARENT_SETTID,
+ * we can poke around in glibc's internal cache by getting the address
+ * using PR_GET_TID_ADDRESS (only available in Linux >= 3.5, with
+ * CONFIG_CHECKPOINT_RESTORE=y) and then overwriting it with
+ * CLONE_CHILD_SETTID. CLONE_CHILD_CLEARTID is set to allow descendant
+ * PR_GET_TID_ADDRESS calls to work, as well as matching what glibc
+ * does in arch_fork().
+ *
+ * Yes, this is pretty horrific, but the core issue here is that we
+ * need to run Go code ("runc init") in the child after fork(), which
+ * is not allowed by glibc (see signal-safety(7)). We cannot exec to
+ * solve the problem because we are in a security critical situation
+ * here, and doing an exec would allow for container escapes (obvious
+ * issues include that the shared libraries loaded from a re-exec would
+ * come from the container, and doing an exec here would clear the bit
+ * that makes non-dumpable flags effective for userns containers with
+ * CAP_SYS_PTRACE).
+ */
+ pid_t *tid_addr = NULL;
+ if (prctl(PR_GET_TID_ADDRESS, &tid_addr) < 0)
+ /* what should we do here... */;
+ write_log(DEBUG, "nsenter clone: get_tid_address gave us %p (pthread_self=%p)", tid_addr, (void *) pthread_self());
+ if (!tid_addr || *tid_addr != gettid())
+ write_log(WARNING, "nsenter clone: glibc private tid address is wrong: *%p %d != gettid() %d", tid_addr, tid_addr ? *tid_addr : -1, gettid());
+
+ return clone(child_func, ca.stack_ptr,
+ CLONE_PARENT | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, &ca,
+ NULL /* parent_tid */ , NULL /* tls */ , tid_addr);
}
/* Returns the clone(2) flag for a namespace, given the name of a namespace. */ @kolyshkin wdyt? |
@lifubang I can also take a look next week at whether we can somehow remove stage-1 so that we don't need a grandchild (which would remove the need for |
This PR needs some refactor work, so convert it to draft state. Line 184 in df04ed4
|
df04ed4
to
41d8548
Compare
Signed-off-by: lifubang <lifubang@acmcoder.com>
1253158
to
bdec4c7
Compare
Welcome more suggestions. |
As the description in opencontainers#4233, there is a bug in glibc, pthread_self() will return wrong info after we do `clone(CLONE_PARENT)` in libct/nsenter, it will cause runc can't work in `go 1.22.*`. So we use fork(2) to replace clone(2) in libct/nsenter, but there is a double-fork in nsenter, so we need to use `PR_SET_CHILD_SUBREAPER` to let runc can reap grand child process in libct/nsenter. Signed-off-by: lifubang <lifubang@acmcoder.com>
This reverts commit ac31da6. Signed-off-by: lifubang <lifubang@acmcoder.com>
This reverts commit e377e16. Signed-off-by: lifubang <lifubang@acmcoder.com>
bdec4c7
to
1e60e59
Compare
logrus.Warn(err) | ||
} | ||
} | ||
func newSignalHandler(notifySocket *notifySocket) *signalHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @kolyshkin pointed out in #4278 (comment)
This PR has done such things. But there is still one thing I can't determine whether we should do or not:
If we use PR_SET_CHILD_SUBREAPER
and fork(2)
to replace clone(CLONE_PARENT)
, I think we should move this signal handler to libcontainer
. Or else someone use libcontainer
directly in the code will have to write a signal handler by themselves.
closing in favor of #4292 |
As the description in #4233, there is a bug in glibc, pthread_self()
will return wrong info after we do
clone(CLONE_PARENT)
in libct/nsenter,it will cause runc can't work in
go 1.22.*
. So we use fork(2) to replaceclone(2) in libct/nsenter, but there is a double-fork in nsenter, so we
need to use
PR_SET_CHILD_SUBREAPER
to let runc can reap grand childprocess in libct/nsenter.
Fix #4233