Worker-based Async API with await/Promise support (or plain callbacks) #6

tchakabam · 2020-07-29T10:37:43Z

I would like to explain a bit how I thought about the pros/cons of doing it like this or another way, why I think this is necessary, and what I would envision in terms of usage possibilities and trade-offs for this Node.js binding to libSRT in the future.

Problem

The current N-API binding layer to libSRT is such that every native call, which are all blocking I/O, runs synchroneously with the wrapping JS function call.

This specific making design of the bindings certainly has advantages, but comes with some usage issues on the JS side as these functions are called from the Node.js proc main-thread / event loop.

As we are trying to achieve higher througputs reading/writing to SRT file-descriptors takes a while. For example, if we read/write 16kbytes from a socket per one JS main thread tick every 20ms, that’s just about 6,4 Mbit/s throughput. That means that we only have this window of 20ms between every call to do other tasks, which is smaller as the r/w operation takes time itself.

Awaiting incoming connections in accept mode to implement a server is "tricky" with this sort of blocking API from a JS application point of view (see the current ReadableStream and Server implementations).

Also any other blocking SRT operation done will generally impact application performance in an unpredictable way, as other tasks our Node.js service may run in production will backlog waiting for our sync'd binding to return. Generally, running blocking I/O directly on the JS main loop is a source of issues and considered bad practice for reasons better explained in a "theory" section I wrote up below (but feel free to TLDR that if you agree so far).

Solution

To solve this problem, we run the native calls into libSRT i.e the blocking binding methods on a JS Worker thread. The architecture used here is doing generic RPC'ing from a "proxy" class. To do this, the proxy class reflects the method names & signatures of the bindings JS API that we are calling in the Worker message-queue iterations. The proxy class calls a generic function to pass all argument as an array and the method name itself as litteral string value in a message to the worker (like in an RPC, where the remote end here is the other thread running the actual SRT API bindings).

The herewith introduced AsyncSRT class which implements this proxy has also the same API as the current version, with a callback function argument added on every method, and each returns a Promise of the result value that the sync API would return.

A proxy class instance owns a worker thread instance each and allows therefore to manage queued RPC's and respective results/callbacks to one "SRT thread" each. Concurrent access to underlying native sockets/file-descriptors via libSRT is thus possible in a managed way (for example one thread may accept new connections, while another worker can perform read/write ops on FDs).
This is relatively simple to implement here as you can see, as since the SRT calls are all blocking and no other thread does anything event-loopish and it is all polling etc, the queuing/dequeuing order of RPCs/callbacks between main<->worker threads is fully maintained.
You can also run the SRT async API i.e spawn the worker thread from another parent worker btw, that's all the same to it.
Potentially out-of-order RPC-back/result-dequeuing is possible with a type of call-ID generator and callback-map that we prototyped (but not used atm since the worker is only doing sync/blocking internally with the current SRT lib binding).
Any user passed callback is wraped in the new Promise executor, in which the actual worker message-posting is called. The Promise resolver is the corresponding result message from the worker, and it will call both Promise.resolve and the plain callback, if any was passed at all (that is optional). By design choice it will get called after the resolve function, but that only plays a role if you want to use both callbacks and Promises- which is a rather rare case.
Every promise returned has a timeout which defaults to 3000 ms. We could make this configurable at runtime for all the bindings module, or even allow to pass in an optional timeout value on each async API method in principle. If the timeout is exceeded, the promise gets rejected.
The accept method is the only one which does not use/need a timeout, for obvious reasons. Important to note here, see "Worker Termination" in the issues section ...

Issues (new or remaining)

Zero-copy

Atm we are doing a copy of Buffer objects across (for read/write methods), as sharing/message-passing buffer refs is otherwise non trivial or illegal with Workers. This is far from ideal way to handle this. In principle, there are several possible solutions to make this zero-copy (see SharedArrayBuffer or Transferable objects). But these lay some constraints onto the application using them. So we want to make copy the default mode, and allow to enable others by configuration (in further patches).

`Worker` overhead

The JS Worker as a light-weight thread-abstraction makes this solution very easy to implement. However, it certainly will have a different performance overhead (and other constraints in handling dataflow- see zero-copy) due to being in JS land as opposed to C/N-API land.

`accept` & Worker Termination

There is atm no way to "cancel" a once posted "accept job". This SRT function call effectively blocks the worker thread until any client connects on a listened socket. Addings such a helper as a feature might be interesting, as one would want to terminate a worker thread properly. It would obviously imply closing the socket from any other concurrent thread (since the thread running accept() can't do anything else)- and that can be the main thread too in principle. This issue here is of course as much a relevant one with the plain sync bindings run in any thread. It only remains as much, even with async execution support.

Challenging ideas

"Binding layer should do this"

We could have certainly solved this with napi_create_async_work.

But that would have meant to basically rewrite the whole binding layer. The solution here is interesting by how little or nothing it adds to enable powerful usage.

To provide a natively asynchroneous I/O N-API binding layer for libSRT here is certainly a larger task we should envision for the future.

It would in that case still be interesting to keep both native bindings (sync vs async at N-API layer) around, to see if there are pros/cons to implementing async I/O in either the way we do here via Worker, or via napi_create_async_work?

"It would be simpler in main thread"

We can (actually should) keep the current sync'd binding layer around imo, with all the currently existing JS classes that are written directly on top of it to do scheduling tricks of the blocking I/O on the main thread (like the SRT-Readable/Writable classes here). It should be an interesting topic to see if we can optimize these for other cases too. It can be an interesting challenge as well to get the scheduling of these blocking tasks onto main loop execution timeframes right in order to keep it unblocked as much as possible. Both blocking vs async running bindings can exist side-by-side in this codebase and present different topics of research.

A bit of JavaScript Theory

A bit about the theory here why this is needed (mostly have been writing this up to give myself a clear head about it):

Conceptually for one, it is as far as "not a viable concept" for integrating with the JS ecosystem to run any blocking I/O on the main thread, where this environment specifically expects native/system-specific I/O calls to happen in background threads and leverage all advantages from that "async I/O".

-On the other hand, JS does not allow to construct any main-loop-like concept (as in while(true) { /* process some I/O event/message queue here */ }); for the simple reason that it does not expose OS-level threads (as do Java or Python or other high-level (dynamic) runtimes).

In JS there is just no way to do that, without causing congestion of internal event queuing of the very runtime by doing it (mostly as individual execution ticks take too long for upcoming other events to be processed). That very concept of an event-loop is built into the language or runtime this language specifies and thus expects.

-So far for the theory :)

…rence */

+ potentially out-of-order RPC-back/result-dequeuing is possible with a type of call-ID generator and callback-map that we prototyped (but not used atm since the worker is only doing sync/blocking internally with the current SRT lib binding).

…wait)

tchakabam · 2020-07-29T20:38:15Z

src/async.js

+   * @param msTimeOut
+   */
+  epollUWait(epid, msTimeOut, callback) {
+    return this._createAsyncWorkPromise("epollAddUsock", [epid, msTimeOut], callback);


"epollUWait"

…alse), and a custom timeout value option (default to default timeout), which can potentially be set differently than the general default timeout. + make the timeout value a static class property so that it can be user-defined module-load wide. defaults to constant in module top scope.

+ add a custom timeout value argument to _createAsyncWorkPromise

birme · 2020-08-03T07:45:27Z

Thank you for this PR. Lots of thoughts and work put behind this one. Will have a look but I think your approach sounds like the right one for now.

birme

Awesome contribution!

Eyevinn#6) * fix lint errors in eslint config :) * index.d.s: add /* eslint-disable @typescript-eslint/triple-slash-reference */ * improve eslint calling script * add async-worker version of the SRT JS API with await/Promise support + potentially out-of-order RPC-back/result-dequeuing is possible with a type of call-ID generator and callback-map that we prototyped (but not used atm since the worker is only doing sync/blocking internally with the current SRT lib binding). * add various examples for async API ("classic" callbacks / Promise / await) * add .eslintignore (should go with commit where we just call "eslint .") * index.js: add missing semi * async.js: fix method name litteral in epollUWait * async.js: allow for accept method to use a timeout opt (defaults to false), and a custom timeout value option (default to default timeout), which can potentially be set differently than the general default timeout. + make the timeout value a static class property so that it can be user-defined module-load wide. defaults to constant in module top scope. * async.js: fix for a rejected promise, make sure we don't resolve anymore + add a custom timeout value argument to _createAsyncWorkPromise * async.js: rm an experiment * fix a lint error in example * add types for async api and fix some details on binding API types * export async API to index * async.js: add some missing docs for private methods * include async types in index

tchakabam added 7 commits July 28, 2020 14:07

fix lint errors in eslint config :)

1317f19

index.d.s: add /* eslint-disable @typescript-eslint/triple-slash-refe…

27bf1be

…rence */

improve eslint calling script

f55bd66

add various examples for async API ("classic" callbacks / Promise / a…

8b1715c

…wait)

add .eslintignore (should go with commit where we just call "eslint .")

977b020

index.js: add missing semi

95b8a81

tchakabam commented Jul 29, 2020

View reviewed changes

tchakabam added 9 commits July 29, 2020 23:17

async.js: fix method name litteral in epollUWait

3d052fc

async.js: fix for a rejected promise, make sure we don't resolve anymore

92e99f1

+ add a custom timeout value argument to _createAsyncWorkPromise

async.js: rm an experiment

435188b

fix a lint error in example

3762822

add types for async api and fix some details on binding API types

4006b8b

export async API to index

31abb6a

async.js: add some missing docs for private methods

b96f2c5

include async types in index

fc6ce14

birme approved these changes Aug 3, 2020

View reviewed changes

birme merged commit 074da05 into Eyevinn:master Aug 3, 2020

tchakabam deleted the feat/async-worker-promise-api branch August 4, 2020 03:59

tchakabam restored the feat/async-worker-promise-api branch August 4, 2020 04:00

tchakabam deleted the feat/async-worker-promise-api branch August 4, 2020 04:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker-based Async API with await/Promise support (or plain callbacks) #6

Worker-based Async API with await/Promise support (or plain callbacks) #6

tchakabam commented Jul 29, 2020 •

edited

Loading

tchakabam Jul 29, 2020

birme commented Aug 3, 2020

birme left a comment

Worker-based Async API with await/Promise support (or plain callbacks) #6

Worker-based Async API with await/Promise support (or plain callbacks) #6

Conversation

tchakabam commented Jul 29, 2020 • edited Loading

Problem

Solution

Issues (new or remaining)

Zero-copy

Worker overhead

accept & Worker Termination

Challenging ideas

"Binding layer should do this"

"It would be simpler in main thread"

A bit of JavaScript Theory

tchakabam Jul 29, 2020

Choose a reason for hiding this comment

birme commented Aug 3, 2020

birme left a comment

Choose a reason for hiding this comment

tchakabam commented Jul 29, 2020 •

edited

Loading

`Worker` overhead

`accept` & Worker Termination