Description
I've started to experiment with wasm threads, Rust, and wasm-bindgen
recently to see how well our story shapes up there. The good news is that it's all working pretty well! On basically every demo I've written so far, though, I've very quickly run up against the wall of atomic.wait
instructions are not allowed on the main thread (they throw an error). I'm currently testing in Firefox, but I think this behavior is mirrored in other implementations?
On the surface and in abstract the lack fo atomic.wait
definitely makes sense. Reducing jank is always good! In practice, though, I've found this severly limiting when trying to write applications. The use case I'm exploring currently is to instantiate a WebAssebly.Instance
on the main thread, and then postMessage
that instance's module and its shared memory to a set of worker threads. That way the main wasm thread (the main application) can enjoy features like DOM access while the worker threads can do the workhorse of all the work. In this model, some gotchas arise pretty quickly.
Most of the gotchas can be categorized as "it's really hard for libraries to avoid blocking synchronization". All code executed on the main thread, and all libraries it links to, can't use any form of blocking synchronization (like mutexes). Some cases where this come up quickly are:
-
Memory allocators - the Rust standard library provides a global memory allocator, for example, which is currently a translation of
dlmalloc
. To make this safe to use in a multithreaded scenario, access to the global allocator is synchronized with a mutex. (can't really imagine a world where memory allocation is asynchronous...). It's really hard for the main thread to entirely avoid allocating memory, or for sub-workers to all avoid allocating memory. -
Synchronizing messages - one of the first problems I ran into was accidentally attempting to lock memory to read it on the main thread. Without
atomic.wait
the only way (I think?) for a worker to synchronize with the main thread (aka wake it up to an event) is viapostMessage
. A worker (in abstract) doesn't even know if that'll wake up the main thread as well! (sub-workers and such).While it's not the worst thing in the world to provide custom synchronization at the app level, this makes me very wary to use any library that has synchronization at all on the main thread. If any library anywhere uses a mutex, even if just for a short period of time, it's not usable on the main thread as it may occasionally throw an exception.
Put another way, it seems like all existing threading-related libraries almost cannot be used by default. Even libraries that provide the ability to specify a custom method to send notifications are at risk of using a mutex for short periods of time to protect some data.
Putting this all together seems like it basically means that the entire main thread for an application has to be entirely user-written and use very few libraries (only those audited to be used on the main thread or saying they don't have synchronization). Even then, I'm not sure how the memory allocation issue would be solved. Additionally it seems like synchronization primitives will almost always have to be hand-rolled for each application, always using postMessage
to communicate from the main thread to workers and back.
Coming out of this is a few questions:
- Is there any way this restriction can be lifted?
- Failing that, can it be partially lifted somehow?
- Failing that, is it expected that this is simply a pattern that's not used in the wild? Would shared memory wasm modules basically entirely live in workers and main thread wasm modules would never use shared memory?
Ideally these problems could be solved by simply saying "atomic.wait
is ok on the main thread", but that of course brings back the jank problem. Some of the possible solutions (like for the memory allocator problem) could be "just use a spin lock if it's short", but I'm not sure how that's better than just allowing atomic.wait
on the main thread? Maybe there's recourse for something like "you can use atomic.wait
only on the main thread if you specify a small timeout". For example it takes Firefox N seconds to say "your script is slowing the page down", could that be the maximum timeout for atomic.wait
?
In general I'm also curious to hear others' thoughts on this as well. Is sharing a wasm module on the main thread with worker threads just a pipe dream? Are there other ways to work around this issue?