Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in exit(0) (LLD) #21

Closed
awson opened this issue Mar 26, 2017 · 49 comments
Closed

Deadlock in exit(0) (LLD) #21

awson opened this issue Mar 26, 2017 · 49 comments

Comments

@awson
Copy link

awson commented Mar 26, 2017

First thanks for your wonderful library.

Here is a bit of prehistory.

Still, while LLD built with your gcc successfully completes all parallel_fors, it deadlocks at exit(0).

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

Thanks for your interest. 😂

Would you please provide some more detailed information (backtrace of each thread for example, which can be obtained using a debugger)?
(I have been keeping an eye on LLVM for a couple of years. Recently they seem to have dropped MinGW support because no documentation about MinGW can be found.)

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

I posted a comment on SF. The first problem didn't seem reproducible.

@awson
Copy link
Author

awson commented Mar 26, 2017

The first problem manifests itself on vanilla MSys2 gcc only.

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

Fair enough. I am trying to build LLD in MSYS now.

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

How did you overcome these errors?

CMake Error: File /e/Desktop/llvm-project/llvm/E:/Desktop/llvm-project/llvm/lib/Target/PowerPC/MCTargetDesc/LLVMBuild.txt does not exist.
CMake Error at /e/Desktop/build/LLVMBuild.cmake:242 (configure_file):
(... hundreds of errors abridged ...)

@awson
Copy link
Author

awson commented Mar 26, 2017

I never built all targets, usually I build X86 and NVPTX only, but to debug LLD on mingw, X86 only would be enough, thus simply add -DLLVM_TARGETS_TO_BUILD=X86 when doing cmake.

Something like this would be OK I believe:
cmake -DLLVM_ENABLE_THREADS=true -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=ON -DLLVM_TARGETS_TO_BUILD=X86.
(I use shared libs build)

@awson
Copy link
Author

awson commented Mar 26, 2017

Also paths like /e/Desktop/llvm-project/llvm/E:/Desktop/llvm-project... look very suspicious.
I never use anything other than plain vanilla cmd.exe shell to build LLVM.

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

I followed the instruction here.

Something like this would be OK I believe:
cmake -DLLVM_ENABLE_THREADS=true -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=ON -DLLVM_TARGETS_TO_BUILD=X86.
(I use shared libs build)

It didn't help. The same errors took place.

@awson
Copy link
Author

awson commented Mar 26, 2017

I suspect the culprit is wrong paths story under MSys shell.
I never had the problems you mentioned when using vanilla command-line.

Usually I build things with simple batch files with (roughly) the following structure:

@echo off
set CXX=g++
set CC=gcc
set PATH=<PATH_TO_GCC_BIN_DIR>;%PATH%
cmake -DLLVM_ENABLE_THREADS=true -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=ON -DLLVM_INCLUDE_TESTS=OFF -DLLVM_TARGETS_TO_BUILD=X86 -G Ninja LLVM_SRC_PATH
ninja lld

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

The path problem still exists. But this time it is the Unix-y path that is causing the problem:
(Note that I was using the generator for Unix makefile while you were using the generator for Ninja)

CreateProcess failed: The system cannot find the file specified.

full-log.txt

@awson
Copy link
Author

awson commented Mar 26, 2017

Did you ever tried to use vanilla command prompt (cmd.exe) instead of unixish one?

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

Yes, neither worked.

Since I am using the MSYS2 cmake, paths are converted to Unix-y ones, so cmake sees the same path in either case:

The C compiler "/mingw32/bin/gcc.exe" is not able to compile a simple test
program.

@awson
Copy link
Author

awson commented Mar 26, 2017

Ah, I use native cmake (mingw or precompiled package from cmake.org), absolutely no problems with it.

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

There is another error now:

E:\Desktop\build>ninja lld
ninja: error: unknown target 'lld', did you mean 'llc'?

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

... You just missed -DLLVM_ENABLE_PROJECTS=lld. I added that option and it builds now.

@awson
Copy link
Author

awson commented Mar 26, 2017

Btw, lld on windows doesn't support windows-gnu target linking, thus to reproduce the problem one should either have xxx-windows-msvc object code (that produced by Visual C/C++ compiler or clang with corresponding target) to link. Do you have Visual C/C++ installed?

Perhaps, it would be possible to test lld against elf code linking, but this way one should have elf code to link. I never tried the latter, though.

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

Yes I have reproduced the problem:

E:\Desktop>cat main.c
#include <stdio.h>

extern void bark();

int main(){
        printf("inside %s\n", __func__);
        bark();
}

E:\Desktop>cat  test.c
#include <stdio.h>

void bark(){
        printf("inside %s\n", __func__);
}

E:\Desktop>cl /nologo /c main.c test.c
main.c
test.c
正在生成代码...

E:\Desktop>build\bin\lld-link.exe main.obj test.obj

Then it hangs.

@awson
Copy link
Author

awson commented Mar 26, 2017

Btw, ExitProcess also deadlocks, only TerminateProcess doesn't.

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

Btw, ExitProcess also deadlocks, only TerminateProcess doesn't.

That is because ExitProcess() sends all DLLs loaded a DLL_PROCESS_DETACH notification, and the thread that called ExitProcess() hangs up on `NtReleaseKeyedEvent()·.

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

I attached LLD with a modified version of mcfgthread that would trigger a breakpoint if NtReleaseKeyedEvent() times out in 5 seconds.

Note that inside a DLL, all destructors of static objects are called by DllMain() with the dwReason parameter set to 0 (DLL_PROCESS_DETACH). Here the destructor ThreadPoolExecutor::~ThreadPoolExecutor() called std::condition_variable::notify_all(), which called __MCFCRT_gthread_cond_broadcast(), which thought there were other threads sleeping on the keyed event (the condition variable itself) and called NtReleaseKeyedEvent() and deadlocked.

There should indeed be threads sleeping on the condition variable before exit(), which I believe is exactly why the LLVM people call std::condition_variable::notify_all() inside the destructor. But remember, inside the DllMain() function, all other threads will have been terminated. This is Windows-specific behavior.

Full backtrace:

Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 13924.0x3308]
0x768f338e in KERNELBASE!DebugBreak () from C:\Windows\syswow64\KernelBase.dll
(gdb) bt
#0  0x768f338e in KERNELBASE!DebugBreak () from C:\Windows\syswow64\KernelBase.dll
#1  0x661023fa in mcfgthread-10!_MCFCRT_Bail () from E:\Desktop\build\bin\mcfgthread-10.dll
#2  0x661071fc in mcfgthread-10!__MCFCRT_OnAssertionFailure () from E:\Desktop\build\bin\mcfgthread-10.dll
#3  0x66103373 in mcfgthread-10!_MCFCRT_BroadcastConditionVariable () from E:\Desktop\build\bin\mcfgthread-10.dll
#4  0x6fee5e2b in  	 (__cond=<optimized out>) at C:/MinGW/MSYS2/mingw32/include/mcfgthread/env/_gthread_inl.h:177
#5  std::condition_variable::notify_all (this=<optimized out>) at ../../../../../gcc/libstdc++-v3/src/c++11/condition_variable.cc:73
#6  0x70e7e793 in lld::internal::ThreadPoolExecutor::~ThreadPoolExecutor (this=0x70ecb640 <lld::internal::getDefaultExecutor()::exec>, __in_chrg=<optimized out>)
    at E:/Desktop/llvm-project/lld/include/lld/Core/Parallel.h:137
#7  0x70e5aed8 in __tcf_0 () at E:/Desktop/llvm-project/lld/include/lld/Core/Parallel.h:173
#8  0x70e411a4 in _CRT_INIT@12 (hDllHandle=hDllHandle@entry=0x70e40000, dwReason=dwReason@entry=0, lpreserved=lpreserved@entry=0x1)
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:144
#9  0x70e412b4 in __DllMainCRTStartup (hDllHandle=hDllHandle@entry=0x70e40000, dwReason=0, lpreserved=lpreserved@entry=0x1)
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:211
#10 0x70e413e7 in DllMainCRTStartup@12 (hDllHandle=0x70e40000, dwReason=0, lpreserved=0x1) at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:171
#11 0x77969364 in ntdll!RtlQueryEnvironmentVariable () from C:\Windows\SysWOW64\ntdll.dll
#12 0x70e40000 in ?? ()
#13 0x77988f68 in ntdll!LdrShutdownProcess () from C:\Windows\SysWOW64\ntdll.dll
#14 0x70e413c4 in __DllMainCRTStartup (hDllHandle=0x148ed18, dwReason=2382904, lpreserved=0x1) at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:201
#15 0x00000000 in ?? ()
(gdb)

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

Briefly, the following happened:

  1. A thread slept on a condition variable. The 'threads_trapped' counter of the condition variable was incremented.
  2. Another thread called ExitProcess() [Update: I incorrectly referred ExitThread() here], terminating the previous thread while it was sleeping.
  3. The second thread broadcast the condition variable. It read and cleared the 'threads_trapped' counter. Because the counter was non-zero, it called NtReleaseKeyedEvent() to wake up those threads sleeping. Note that NtReleaseKeyedEvent() blocks the caller until a thread is woken up. Since there were no threads to wake up, it effectively resulted in a deadlock.

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

Possible workaround: If there are no other threads, don't call NtReleaseKeyedEvent() in ReallySignalConditionVariable(), ReallySignalMutex() or RealSetAndSignalOnceFlag().

... But how can we tell there are no other threads?

@awson
Copy link
Author

awson commented Mar 26, 2017

Another thread called ExitThread(), terminating the previous thread while it was sleeping

Don't quite understand this. Why the "previous" thread is terminated? Because it was created by "another" thread or what?

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

See text following 'When handling DLL_PROCESS_DETACH' in https://msdn.microsoft.com/en-us/library/windows/desktop/ms682583(v=vs.85).aspx .

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

Oops it was a typo. It was ExitProcess() that the second thread called. I fixed up the comment.

@lhmouse lhmouse removed their assignment Mar 26, 2017
@awson
Copy link
Author

awson commented Mar 26, 2017

Removing exit doesn't help. LLD deadlocks. Otherwise I wouldn't even bother to open this ticket.

From your explanations I've understood that the deadlock can happen only if sleeping threads are killed by ExitProcess. But all exit handlers are run before ExitProcess call, hence NtReleaseKeyedEvent won't deadlock.

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

But all exit handlers are run before ExitProcess call, hence NtReleaseKeyedEvent won't deadlock.

Please double check this for sure. In the case of DLL such handlers are run from the entry point function (__DllMainCRTStartup() above) which is a result of ExitProcess().

@awson
Copy link
Author

awson commented Mar 26, 2017

I even tried to put my own exit handler into an LLD code, which simply printed something to the console, and it completed successfully before the deadlock. Thus we have no problems with either mingw-w64 or FILO order.

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

Actually GCC uses atexit() to register dtors of static objects. If you register a callback after the pool's ctor returns, your callback is called before its dtor. If you register a callback before its ctor returns (including inside its ctor body), your callback is called after its dtor.

@lhmouse
Copy link
Owner

lhmouse commented Mar 26, 2017

Again, it is not possible to register a callback that can be called before the pool's dtor without modifying LLD' source, as it requires registering it after the pool's ctor (for example, inside the main() function.)

@awson
Copy link
Author

awson commented Mar 26, 2017

I am not ready to investigate this right now, but this looks extremely strange.

How could atexit semantics depends on the place where it is called? Only handlers execution order depends on it but all of them should be guaranteed to be executed before the exit proper.

@lhmouse
Copy link
Owner

lhmouse commented Mar 27, 2017

How could atexit semantics depends on the place where it is called?

The C standard doesn't care about dynamic libraries, but it is true on almost every platform where there are dynamic libraries. The reason is simple, too: If you load a dynamic library which registers some exit callbacks and then unload it, those callbacks must be run and removed before the virtual memory of the dynamic library is unmapped. If those callbacks were deferred to the time when you realy call atexit() (directly or by returning from main()), the dynamic library might have been unmapped and you would likely get a segment fault.

@lhmouse
Copy link
Owner

lhmouse commented Mar 27, 2017

Please test the new branch. If this is OK I will port it to both branches.

@lhmouse
Copy link
Owner

lhmouse commented Mar 27, 2017

Thank goodness I found the function RtlDllShutdownInProgress() and it saved us!
It was just an accident, wasn't it? 😂

@awson
Copy link
Author

awson commented Mar 27, 2017

Looks like it does the trick for me! Thanks!

@lhmouse lhmouse self-assigned this Mar 27, 2017
lhmouse added a commit that referenced this issue Mar 27, 2017
…turns `true` as it causes deadlocks. See issue #21. (#22)

Close #21.
@lhmouse
Copy link
Owner

lhmouse commented Mar 27, 2017

Merged to master and cherry-picked to release-1.x then.

@wwc7654321
Copy link

Thanks for sharing you solution. @lhmouse
Is this page described the same problem?
https://support.microsoft.com/zh-cn/help/2582203/a-process-that-is-being-terminated-stops-responding-in-windows-7-or-in

I think it's fixed as a bug, but I can't expect every PC is patched.

@wwc7654321
Copy link

And is there any page describes "RtlDllShutdownInProgress" ? It's hard to find discussion about this function

@MaskRay
Copy link
Contributor

MaskRay commented Mar 15, 2019

Since I was just reading lld code, if you want to learn more about lld+exit https://reviews.llvm.org/D58246

@lhmouse
Copy link
Owner

lhmouse commented Mar 15, 2019

@wwc7654321 No. If a thread calls exit() or ExitProcess() without waiting for other threads to terminate then it must not call any synchronization function, nor should it perform memory allocation or deallocation, as the process state is not predictable.

@MaskRay std::thread::detach() is nonsense for a hosted program. Don't use it, period.

@nga888
Copy link

nga888 commented Mar 27, 2019

If you want to see yet more related to LLD exit (using MSVC static run-time), check out my experimental patch here: https://reviews.llvm.org/D40366. I agree that detached threads are not good.

@lhmouse
Copy link
Owner

lhmouse commented Mar 27, 2019

Specifically on Windows, if threads are detached then it is not safe to call std::exit() or std::quick_exit(), or to return from main() which has the same effect as calling std::exit(). The program must call std::_Exit().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants