Concurrency and parallelism in JavaScript #4
Description
JavaScript platforms today (browsers, Node.js) don't expose anything like a thread or a coroutine to the user. There are special cases, such as how browsers have Web Workers and node.js has background worker threads, but for most use cases JS is treated as single-threaded.
For I/O heavy workloads (web servers, talking to databases, networking and file sharing), a non-blocking I/O approach works well for many problems, and because the JS thread is never blocked a program can have a large number of concurrent network connections open even without any application code for parallelism.
There are certain sets of problems where parallel processing of data is necessary for speed reasons. These are things like compression and number crunching. Lots of scientific simulations and models run for a very long time, and being able to cut that time in half (or more) by parallelizing things can make a big difference.
Here are different approaches to concurrency and/or parallelism and their status:
Nonblocking I/O
Concurrent I/O operations are provided by the platform (usually implemented in C++ or something similarly low level), but user defined functions cannot be executed concurrently.
This is widely used today in JS, browsers and Node.js implement networking and file system operations in C++ and let users use JS callbacks as a way to continue execution after some I/O has completed.
The downside of this approach is it only works for a known set of I/O problems, e.g. users have no access to the actual concurrency primitives from their own code.
Data parallelism
The same piece of code is executed several times in parallel.
There was a project called parallel.js (good overview here) what got as far as being implemented in Firefox Nightly, but is being removed due to lack of interest: https://bugzilla.mozilla.org/show_bug.cgi?id=1117724
One aspect of data parallelism is SIMD, which is functionality implemented at the CPU level to process data in parallel. There is a proposal to expose SIMD functionality to JavaScript: https://docs.google.com/presentation/d/1MY9NHrHmL7ma7C8dyNXvmYNNGgVmmxXk8ZIiQtPlfH4/edit#slide=id.p19, and the proposal is actually quite far along.
Task parallelism
Different pieces of code are executed in parallel. You can do this today with Web Workers and Transferable Objects (the ability to transfer a Typed Array between two Web Workers without having to copy the Typed Array), but you don't have fine grained control over how the Web Workers are created and prioritized
There was a proposal back in 2012 that used a scheduler to execute JS functions in parallel using a fork and join style: http://smallcultfollowing.com/babysteps/blog/2012/01/09/parallel-javascript/. However it seems this approach is also currently not being pursued.
Lock-free algorithms using atomic memory operations
Another approach to concurrency is to use so called lock-free algorithms. An important component of them are a class of CPU instructions called atomic operations. Here's a good primer: http://preshing.com/20120612/an-introduction-to-lock-free-programming/
The general idea of atomics is to be able to do complex stuff with memory in a way that prevents the memory from being messed up if its accessed from concurrent threads.
Theres a proposal for shared memory and atomics in JS: https://github.com/lars-t-hansen/ecmascript_sharedmem
What can we do to help?
Here I've tried to summarize the different approaches to concurrency and paralellism, and what the progress is for each.
However, not all are equally important for numeric computing and/or data science. Here are some ideas on where we can start:
- Find out which scientific use cases would be most impacted by which of the above approaches. Maybe they're all worth pursing, maybe one is the much more useful than the others.
- Make sure the standards bodies are aware of the potential positive impact on the use cases we discover