Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventStream::~EventStream() causes dead-lock when exit() is called in alpaka kernel #3241

Open
jkelling opened this issue May 12, 2020 · 7 comments
Labels
bug a bug in the project's code

Comments

@jkelling
Copy link
Member

Some runtimes (Clang's OpenMP target offload) call exit() when an error occurs.

In picongpu (at least using a blocking queue) this causes a deadlock when the pmacc::EventStream destructor is called, which calls cuplaStreamSynchronize(), which uses the alpaka queues' wait functionality, which tries to lock the queues' mutex. The lock in this mutex is held by the enqueue function which never returned in this case.

I do know if this can be feasibly fixed, just want to bring it up for discussion.

@psychocoderHPC psychocoderHPC added the bug a bug in the project's code label May 13, 2020
@psychocoderHPC
Copy link
Member

pmacc::EventStream should only be destructed at the end of the simulation. Do I am correct that this deadlock is showing up at the end of the simulation?

@jkelling
Copy link
Member Author

That depends on how you define "end of simulation": It is not at the intended end of the simulation.

clang's OpenMP target offload runtime encounters an error in or around a target region (i.e. inside the alpaka TaskKernel*::operator() function. After printing an error message with zero information, it calls exit(), which ends the program right there.

I think it would be nicer of the runtime to raise an exception or trigger sigabrt (like assert() ), but at the moment it is not and it may not be the last runtime to decide to go this route.

@jkelling jkelling changed the title EventStream::~EventStream() causes dead when exit() is called in alpaka kernel EventStream::~EventStream() causes dead-lock when exit() is called in alpaka kernel May 13, 2020
@sbastrakov
Copy link
Member

I thought std::exit is a variant of a normal end of the simulation (including returning from main()), with destructors properly called. Then I suppose the mutexes should be also unlocked by their destructors. Perhaps I have a wrong idea about it.

@jkelling
Copy link
Member Author

No, I think you are right about exit.

The problem seems to be, that some constructors are called, but the stack is not unwound properly.

@sbastrakov
Copy link
Member

Ah, thanks for clarifications. So it seems we can't really force stack unwinding in this case, which feels bad.

@jkelling
Copy link
Member Author

jkelling commented May 13, 2020

cpp-reference.com confirms, that std::exit does not unwind the stack. It talks about std::atexit, to register handlers, but using this does seem to work.

Apparently the atexit handler is called after the pmacc::EventStream dtor. (put that test in the wrong place, might work but would require the lock to be registered inside a singleton or otherwise global instance.)

@jkelling
Copy link
Member Author

jkelling commented Jul 1, 2020

I filed an enhancement request with clang for this issue. I think the library should call abort() in case of a fatal error.
https://bugs.llvm.org/show_bug.cgi?id=46515

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug a bug in the project's code
Projects
None yet
Development

No branches or pull requests

3 participants