|
| 1 | +# Multi-threading on a browser |
| 2 | + |
| 3 | +**Owner** [Pavel Savara](https://github.com/pavelsavara) | |
| 4 | + |
| 5 | +## Table of content |
| 6 | +- [Goals](#goals) |
| 7 | +- [Key ideas](#key-ideas) |
| 8 | +- [State April 2024](#state-2024-april) |
| 9 | +- [Design details](#design-details) |
| 10 | +- [State September 2023](#state-2023-sep) |
| 11 | +- [Alternatives](#alternatives---as-considered-2023-sep) |
| 12 | + |
| 13 | +# Goals |
| 14 | +- CPU intensive workloads on dotnet thread pool. |
| 15 | +- Allow user to start new managed threads using `new Thread` and join it. |
| 16 | +- Add new C# API for creating web workers with JS interop. Allow JS async/promises via external event loop. |
| 17 | +- enable blocking `Task.Wait` and `lock()` like APIs from C# user code on all threads |
| 18 | + - Current public API throws PNSE for it |
| 19 | + - This is core part on MT value proposition. |
| 20 | + - If people want to use existing MT code-bases, most of the time, the code is full of locks. |
| 21 | + - People want to use existing desktop/server multi-threaded code as is. |
| 22 | +- allow HTTP and WS C# APIs to be used from any thread despite underlying JS object affinity. |
| 23 | +- Blazor `BeginInvokeDotNet`/`EndInvokeDotNetAfterTask` APIs work correctly in multithreaded apps. |
| 24 | +- JSImport/JSExport interop in maximum possible extent. |
| 25 | +- don't change/break single threaded build. † |
| 26 | + |
| 27 | +## Lower priority goals |
| 28 | +- try to make it debugging friendly |
| 29 | +- sync C# to async JS |
| 30 | + - dynamic creation of new pthread |
| 31 | + - implement crypto via `subtle` browser API |
| 32 | + - allow MonoVM to lazily download DLLs from the server, instead of during startup. |
| 33 | + - implement synchronous APIs of the HTTP and WS clients. At the moment they throw PNSE. |
| 34 | +- sync JS to async JS to sync C# |
| 35 | + - allow calls to synchronous JSExport from UI thread (callback) |
| 36 | +- don't prevent future marshaling of JS [transferable objects](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Transferable_objects), like streams and canvas. |
| 37 | +- offload CPU intensive part of WASM startup to WebWorker, so that the pre-rendered (blazor) UI could stay responsive during Mono VM startup. |
| 38 | + |
| 39 | +## Non-goals |
| 40 | +- interact with JS state on `WebWorker` of managed threads other than UI thread or dedicated `JSWebWorker` |
| 41 | + |
| 42 | +<sub><sup>† Note: all the text below discusses MT build only, unless explicit about ST build.</sup></sub> |
| 43 | + |
| 44 | +# Key ideas |
| 45 | + |
| 46 | +Move all managed user code out of UI/DOM thread, so that it becomes consistent with all other threads. |
| 47 | + |
| 48 | +## Context - Problems |
| 49 | +**1)** If you have multithreading, any thread might need to block while waiting for any other to release a lock. |
| 50 | +- locks are in the user code, in nuget packages, in Mono VM itself |
| 51 | +- there are managed and un-managed locks |
| 52 | +- in single-threaded build of the runtime, all of this is NOOP. That's why it works on UI thread. |
| 53 | + |
| 54 | +**2)** UI thread in the browser can't synchronously block |
| 55 | +- that means, "you can't not block" UI thread, not just usual "you should not block" UI |
| 56 | + - `Atomics.wait()` throws `TypeError` on UI thread |
| 57 | +- you can spin-wait but it's bad idea. |
| 58 | + - Deadlock: when you spin-block, the JS timer loop and any messages are not pumping. |
| 59 | + - But code in other threads may be waiting for some such event to resolve. |
| 60 | + - all async/await don't work |
| 61 | + - all networking doesn't work |
| 62 | + - you can't create or join another web worker |
| 63 | + - browser dev tools UI freeze |
| 64 | + - It eats your battery |
| 65 | + - Browser will kill your tab at random point (Aw, snap). |
| 66 | + - It's not deterministic and you can't really test your app to prove it harmless. |
| 67 | +- all the other threads/workers could synchronously block |
| 68 | + - `Atomics.wait()` works as expected |
| 69 | +- if we will have managed thread on the UI thread, any `lock` or Mono GC barrier could cause spin-wait |
| 70 | + - in case of Mono code, we at least know it's short duration |
| 71 | + - we should prevent it from blocking in user code |
| 72 | + |
| 73 | +**3)** JavaScript engine APIs and objects have thread affinity. |
| 74 | +- The DOM and few other browser APIs are only available on the main UI "thread" |
| 75 | + - and so, you need to have C# interop with UI, but you can't block there. |
| 76 | +- HTTP & WS objects have affinity, but we would like to consume them (via Streams) from any managed thread |
| 77 | +- Any `JSObject`, `JSException` and `Promise`->`Task` have thread affinity |
| 78 | + - they need to be disposed on correct thread. GC is running on random thread |
| 79 | + |
| 80 | +**4)** State management of JS context `self` of the worker. |
| 81 | +- emscripten pre-allocates pool of web worker to be used as pthreads. |
| 82 | + - Because they could only be created asynchronously, but `pthread_create` is synchronous call |
| 83 | + - Because they are slow to start |
| 84 | +- those pthreads have stateful JS context `self`, which is re-used when mapped to C# thread pool |
| 85 | +- when we allow JS interop on a managed thread, we need a way how to clean up the JS state |
| 86 | + |
| 87 | +**5)** Blazor's `renderBatch` is using direct memory access |
| 88 | + |
| 89 | +**6)** Dynamic creation of new WebWorker requires async operations on emscripten main thread. |
| 90 | +- we could pre-allocate fixed size pthread pool. But one size doesn't fit all and it's expensive to create too large pool. |
| 91 | + |
| 92 | +**7)** There could be pending HTTP promise (which needs browser event loop to resolve) and blocking `.Wait` on the same thread and same task/chain. Leading to deadlock. |
| 93 | + |
| 94 | +# State 2024 April |
| 95 | + |
| 96 | +## What was implemented in Net9 - Deputy thread design |
| 97 | + |
| 98 | +For other possible design options we considered [see below](#alternatives-and-details---as-considered-2023-sep). |
| 99 | + |
| 100 | +- Introduce dedicated web worker called "deputy thread" |
| 101 | + - managed `Main()` is dispatched onto deputy thread |
| 102 | +- MonoVM startup on deputy thread |
| 103 | + - non-GC C functions of mono are still available |
| 104 | +- Emscripten startup stays on UI thread |
| 105 | + - C functions of emscripten |
| 106 | + - download of assets and into WASM memory |
| 107 | +- UI/DOM thread |
| 108 | + - because the UI thread would be mostly idling, it could: |
| 109 | + - render UI, keep debugger working |
| 110 | + - dynamically create pthreads |
| 111 | + - UI thread stays attached to Mono VM for Blazor's reasons (for Net9) |
| 112 | + - it keeps `renderBatch` working as is, bu it's far from ideal |
| 113 | + - there is risk that UI could be suspended by pending GC |
| 114 | + - It would be ideal change Blazor so that it doesn't touch managed objects via naked pointers during render. |
| 115 | + - we strive to detach the UI thread from Mono |
| 116 | +- I/O thread |
| 117 | + - is helper thread which allows `Task` to be resolved by UI's `Promise` even when deputy thread is blocked in `.Wait` |
| 118 | +- JS interop from any thread is marshaled to UI thread's JavaScript |
| 119 | +- HTTP and WS clients are implemented in JS of UI thread |
| 120 | +- There is draft of `JSWebWorker` API |
| 121 | + - it allows C# users to create dedicated JS thread |
| 122 | + - the `JSImport` calls are dispatched to it if you are on the that thread |
| 123 | + - or if you pass `JSObject` proxy with affinity to that thread as `JSImport` parameter. |
| 124 | + - The API was not made public in Net9 yet |
| 125 | +- calling synchronous `JSExports` is not supported on UI thread |
| 126 | + - this could be changed by configuration option but it's dangerous. |
| 127 | +- calling asynchronous `JSExports` is supported |
| 128 | +- calling asynchronous `JSImport` is supported |
| 129 | +- calling synchronous `JSImport` is supported without synchronous callback to C# |
| 130 | +- Strings are marshaled by value |
| 131 | + - as opposed to by reference optimization we have in single-threaded build |
| 132 | +- Emscripten VFS and other syscalls |
| 133 | + - file system operations are single-threaded and always marshaled to UI thread |
| 134 | +- Emscripten pool of pthreads |
| 135 | + - browser threads are expensive (as compared to normal OS) |
| 136 | + - creation of `WebWorker` requires UI thread to do it |
| 137 | + - there is quite complex and slow setup for `WebWorker` to become pthread and then to attach as Mono thread. |
| 138 | + - that's why Emscripten pre-allocates pthreads |
| 139 | + - this allows `pthread_create` to be synchronous and faster |
| 140 | + |
| 141 | +# Design details |
| 142 | + |
| 143 | +## Define terms |
| 144 | +- UI thread |
| 145 | + - this is the main browser "thread", the one with DOM on it |
| 146 | + - it can't block-wait, only spin-wait |
| 147 | +- "sidecar" thread - possible design |
| 148 | + - is a web worker with emscripten and mono VM started on it |
| 149 | + - there is no emscripten on UI thread |
| 150 | + - for Blazor rendering MAUI/BlazorWebView use the same concept |
| 151 | + - doing this allows all managed threads to allow blocking wait |
| 152 | +- "deputy" thread - possible design |
| 153 | + - is a web worker and pthread with C# `Main` entrypoint |
| 154 | + - emscripten startup stays on UI thread |
| 155 | + - doing this allows all managed threads to allow blocking wait |
| 156 | +- "managed thread" |
| 157 | + - is a thread with emscripten pthread and Mono VM attached thread and GC barriers |
| 158 | +- "main managed thread" |
| 159 | + - is a thread with C# `Main` entrypoint running on it |
| 160 | + - if this is UI thread, it means that one managed thread is special |
| 161 | + - see problems **1,2** |
| 162 | +- "managed thread pool thread" |
| 163 | + - pthread dedicated to serving Mono thread pool |
| 164 | +- "comlink" |
| 165 | + - in this document it stands for the pattern |
| 166 | + - dispatch to another worker via pure JS means |
| 167 | + - create JS proxies for types which can't be serialized, like `Function` |
| 168 | + - actual [comlink](https://github.com/GoogleChromeLabs/comlink) |
| 169 | + - doesn't implement spin-wait |
| 170 | + - we already have prototype of the similar functionality |
| 171 | + - which can spin-wait |
| 172 | + |
| 173 | +## Proxies - thread affinity |
| 174 | +- all proxies of JS objects have thread affinity |
| 175 | +- all of them need to be used and disposed on correct thread |
| 176 | + - how to dispatch to correct thread is one of the questions here |
| 177 | +- all of them are registered to 2 GCs |
| 178 | + - `Dispose` need to be schedule asynchronously instead of blocking Mono GC |
| 179 | + - because of the proxy thread affinity, but the target thread is suspended during GC, so we could not dispatch to it, at that time. |
| 180 | + - the JS handles need to be freed only after both sides unregistered it (at the same time). |
| 181 | +- `JSObject` |
| 182 | + - have thread ID on them, so we know which thread owns them |
| 183 | +- `JSException` |
| 184 | + - they are a proxy because stack trace is lazy |
| 185 | + - we could eval stack trace eagerly, so they could become "value type" |
| 186 | + - but it would be expensive |
| 187 | +- `Task` |
| 188 | + - continuations need to be dispatched onto correct JS thread |
| 189 | + - they can't be passed back to wrong JS thread |
| 190 | + - resolving `Task` could be async |
| 191 | +- `Func`/`Action`/`JSImport` |
| 192 | + - callbacks need to be dispatched onto correct JS thread |
| 193 | + - they can't be passed back to wrong JS thread |
| 194 | + - calling functions which return `Task` could be aggressively async |
| 195 | + - unless the synchronous part of the implementation could throw exception |
| 196 | + - which maybe our HTTP/WS could do ? |
| 197 | + - could this difference be ignored ? |
| 198 | +- `JSExport`/`Function` |
| 199 | + - we already are on correct thread in JS, unless this is UI thread |
| 200 | + - would anything improve if we tried to be more async ? |
| 201 | +- `MonoString` |
| 202 | + - we have optimization for interned strings, that we marshal them only once by value. Subsequent calls in both directions are just a pinned pointer. |
| 203 | + - in deputy design we could create `MonoString` instance on the UI thread, but it involves GC barrier |
| 204 | + |
| 205 | +## JSWebWorker with JS interop |
| 206 | +- is proposed concept to let user to manage JS state of the worker explicitly |
| 207 | + - because of problem **4** |
| 208 | +- is C# thread created and disposed by new API for it |
| 209 | +- could block on synchronization primitives |
| 210 | +- could do full JSImport/JSExport to it's own JS `self` context |
| 211 | +- there is `JSSynchronizationContext`` installed on it |
| 212 | + - so that user code could dispatch back to it, in case that it needs to call `JSObject` proxy (with thread affinity) |
| 213 | +- this thread needs to throw on any `.Wait` because of the problem **7** |
| 214 | + |
| 215 | +## HTTP and WS clients |
| 216 | +- are implemented in terms of `JSObject` and `Promise` proxies |
| 217 | +- they have thread affinity, see above |
| 218 | + - typically to the `JSWebWorker` of the creator |
| 219 | +- but are consumed via their C# Streams from any thread. |
| 220 | + - therefore need to solve the dispatch to correct thread. |
| 221 | + - such dispatch will come with overhead |
| 222 | + - especially when called with small buffer in tight loop |
| 223 | + - or we could throw PNSE, but it may be difficult for user code to |
| 224 | + - know what thread created the client |
| 225 | + - have means how to dispatch the call there |
| 226 | + - other unknowing users are `XmlUrlResolver`, `XmlDownloadManager`, `X509ResourceClient`, ... |
| 227 | +- because we could have blocking wait now, we could also implement synchronous APIs of HTTP/WS |
| 228 | + - so that existing user code bases would just work without change |
| 229 | + - this would also require separate thread, doing the async job |
| 230 | + - we could use I/O thread for it |
| 231 | + |
| 232 | +## Performance |
| 233 | +As compared to ST build for dotnet wasm: |
| 234 | +- the dispatch between threads (caused by JS object thread affinity) will have negative performance impact on the JS interop |
| 235 | +- in case of HTTP/WS clients used via Streams, it could be surprizing |
| 236 | +- browser performance is lower when working with SharedArrayBuffer |
| 237 | +- Mono performance is lower because there are GC safe-points and locks in the VM code |
| 238 | +- startup is slower because creation of WebWorker instances is slow |
| 239 | +- VFS access is slow because it's dispatched to UI thread |
| 240 | +- console output is slow because it's POSIX stream is dispatched to UI thread, call per line |
| 241 | + |
| 242 | +# Alternatives and details - as considered 2023 Sep |
| 243 | +See https://gist.github.com/pavelsavara/c81ef3a9e4000d67f49ddb0f1b1c2284 |
0 commit comments