Skip to content

Unstable behavior when creating/joining threads concurrently/in a short time span #13986

Closed
@Sigill

Description

@Sigill

Tests done with Chrome 91, emsdk 2.0.17.

So, I'm experimenting porting to WebAssembly some C++ algorithm. Those algorithm are not all well-behaved from a multi-thread point a view: I'm not always sure how many thread they will use.

In order to allow any number of threads to be spawned, without having to fix PTHREAD_POOL_SIZE, I tried launching them on a dedicated thread using std::async (the main thread is therefore available to handle the pthread_create() calls).

I wrapped std::future with Embind, and using setTimeout I can periodically observe the future and ultimately resolve a Javascript Promise, which I find quite elegant (this might be dumb and/or dangerous, do not hesitate to point out why).

I'm however encountering some issues, and came up with this MWE to illustrate the issue: https://github.com/Sigill/emscripten-std-future-issue (it's not 100% reproducible though, there might be some race condition).

My "algorithm" is basically:

algo():
  3 times:
    spawn 8 inner threads; // each worker sleeps for 50ms
    join them;

This algorithm uses in theory no more than 8 threads (9 when launched through std::async) at any point in time. I've set PTHREAD_POOL_SIZE to 20, which should give enough threads for 2 concurrent instances.

The synchronous version (without std::async) appears to works fine. It blocks the main thread (which is expected), but the thread pool is big enough to start the 8 inner threads.

The asynchronous version (with std::async) appears to works fine. I can even reduce PTHREAD_POOL_SIZE to 1, the main thread is able to start 8 more workers.

I usually can launch 2 concurrent instances in parallel.
But sometimes I see that I end-up with more than the 20 pre-allocated workers.
And sometimes it crashes, e.g.:

pthread_join attempted on thread 11563536, which does not point to a valid thread, or does not exist anymore!

If I launch the synchronous version, then the asynchronous one, I usually end-up in an infinite loop while checking the future:

  • The synchronous call will finish properly.
  • The asynchronous one will start but never finish because:
    • The 8 inner std::threads will be created but will never start (I'll be joining them forever).

If I launch the synchronous version, then 2 instances of the asynchronous one, I either end-up in the infinite loop from above, or I encounter a variety of new crashes:

PThread 5298720 is attempting to join to itself!
pthread_join attempted on thread 1600418931, which does not point to a valid thread, or does not exist anymore!
Stack overflow! Stack cookie has been overwritten, expected hex dwords 0x89BACDFE and 0x2135467, but received 0x0 f4b168
onmessage() captured an uncaught exception: RuntimeError: memory access out of bounds
onmessage() captured an uncaught exception: RuntimeError: table index is out of bounds
Attempted to join thread 38838000, which was already detached!

However, if I delay the asynchronous call by some time (e.g. 200ms), it will usually works fine. I might still end-up with more than 20 workers though, and it might still crash (rarely).

I am under the impression that when pthread_create()/pthread_join()/thread ExitHandlers happen concurrently and/or in a short time span(*), some things might not be properly synchronized and something is destroyed in the process.

(*) I discovered that if I reduce the duration of the inner threads to zero, even the "single asynchronous" case starts to crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions