-
Notifications
You must be signed in to change notification settings - Fork 469
Linux: avoid zombie process when dispatch_main is called #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not fond of that patch (and Linux still hasn't fixed that bug, really?)
I looked at the glibc implementation and we do have a way out, which is that in dispatch_main() to write code that does this (be prepared to be ultimately grossed out): void
dispatch_main(void)
{
#if HAVE_PTHREAD_MAIN_NP
if (pthread_main_np()) {
#endif
_dispatch_object_debug(&_dispatch_main_q, "%s", __func__);
_dispatch_program_is_probably_callback_driven = true;
_dispatch_ktrace0(ARIADNE_ENTER_DISPATCH_MAIN_CODE);
#ifdef __linux__
pthread_key_t dispatch_main_key;
pthread_key_create(&dispatch_main_key, _dispatch_sig_thread);
pthread_setspecific(dispatch_main_key, &dispatch_main_key); // anything non NULL really
#endif
pthread_exit(NULL);
DISPATCH_INTERNAL_CRASH(errno, "pthread_exit() returned");
#if HAVE_PTHREAD_MAIN_NP
}
DISPATCH_CLIENT_CRASH(0, "dispatch_main() must be called on the main thread");
#endif
} also in _dispatch_queue_cleanup2(), disable this code for linux: #ifndef __linux__
if (_dispatch_program_is_probably_callback_driven) {
_dispatch_barrier_async_detached_f(_dispatch_get_root_queue(
_DISPATCH_QOS_CLASS_DEFAULT, true), NULL, _dispatch_sig_thread);
sleep(1); // workaround 6778970
}
#endif since we now have main() becoming the sigsuspend thread. my code works because glibc calls the TSD destructors in order of creation, so since we create that one last, it will be the last destructor called. It is highly non portable, but we're trying to work around a linux bug to boot. That way all TSD and thread_atexit() (C++ thread_locals) will be called properly. Again it is a HUGE kludge, but I think it's more robust (to any code) than your patch, albeit brittle to any glibc change in how TSD and |
it may also be worth having a unit test on linux that does check that TSD destructors indeed are called in pthread_key_create() call order, because the day that changes, this hack no longer works. But that is the only way I can think of looking at pthread_create |
FWIW a very simple sample program shows that the backtraces are exactly what you'd expect with that trick, it seems to work fine:
|
3208542
to
aa824ab
Compare
I wasn't fond of my patch either; yours is better. Still potentially fragile as you noted, but in a way that is much less likely to break in practice. Thanks! I think at this point Linux considers it to be a feature, not a bug ;) At least, it still happens on my ubuntu 16.04 virtual machine -- any program that calls dispatch_main becomes a zombie until it exits. Pull request updated. |
sorry for the spam; spoke too soon. I forgot the second piece of the patch. Fixing... |
aa824ab
to
febea9e
Compare
#ifdef __linux__ | ||
pthread_key_t dispatch_main_key; | ||
pthread_key_create(&dispatch_main_key, _dispatch_sig_thread); | ||
pthread_setspecific(dispatch_main_key, &dispatch_main_key); // anything non NULL really |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the // anything non null really
is a bad comment.
you should explain above that the point of creating that TSD key before pthread_exit() is because glibc will call the destructor of that TSD key last, which gets the thread parked in sigsuspend hopefully after all other TSD destructors have run (there are still holes as a TSD constructor could re-use a TSD and glibc has to loop over TSD keys to find out about this, and I'm pretty sure my hack prevents this, but that's as good it can be).
I'll let you phrase that comment as a native speaker ;) but given how non trivial and horrid that hack is, it needs documentation. thanks
If the main thread calls pthread_exit on Linux, the process becomes a zombie. Modify dispatch_main to avoid this by adding a pthread_key destructor that calls dispatch_sig_thread (effectively stalling pthread exit until the program really exits). This patch relies on the TSD destructors being called in order of creation, which is currently the case in glibc.
febea9e
to
cf9fe36
Compare
updated with comments. |
excellent 👍 |
Linux: avoid zombie process when dispatch_main is called Signed-off-by: Daniel A. Steffen <dsteffen@apple.com>
If the main thread calls pthread_exit on Linux, the process
becomes a zombie. Modify dispatch_main to avoid this by manually
performing thread cleanup and pausing the thread instead of calling
pthread_exit. This is an updated version of an old patch for this
issue (https://lists.macosforge.org/pipermail/libdispatch-dev/2011-May/000522.html).