Enhancement: pass data from node::Buffer into threadpool

## Context

The primary goal of many node C++ addons is to provide efficient and concurrent operations on data such that the total latency under high load will be reduced compared to a pure JS implementation.

This goal is achieved when the time taken to pass data into the threadpool, compute the operation, and return data/results from the threadpool is either:

 - 🍍 Faster than pure JS compute (when we compare 1 JS operation with 1 C++ operation); or
 - 🍊 Has less cost than the cost of the JS main loop being blocked when running the compute in pure JS and therefore results in lower latency overall. In other words, where the main bottleneck of the program was main event loop blocking (more work could be done by an available CPU but because of the synchronous nature of the event loop the CPU was not fully utilized).

The 🍊 scenario is the most common reason to move processing to C++. This means that 🍊 makes sense as a key way to reduce your program latency even if the time of a single computation is slower in C++. The highest aim is to achieve both 🍍 and 🍊, but the latter is the most important.

To recap:

Port to C++ when:

  - you have evidence your event loop is blocked on computation
  - you have evidence your CPU(s) are underutilized

Deploy your C++ when:

 - your C++ port of the compute runs raster (comparing one C++ call to one JS call), or;
 - your latency drops because the event loop is unblocked and is able to do other work while your C++ compute runs async
 - and ideally both. 

_Note: Another goal of some node C++ addons is simply to wrap/make available some existing library in C++ for nodejs users and. For this case performance and concurrency are not often motivating factors to the design. An example of this is node-gdal, which [currently only supports sync functions](https://github.com/naturalatlas/node-gdal/issues/18). An example of a node module that both wraps an existing C++ library to expose functionality and aims for optimally efficient and async functions is [node-mapnik](https://github.com/mapnik/node-mapnik)_

## Improvements to skel

Currently node-cpp-skel examples only show code passing a `std::string` into the threadpool that represents a "name" (or a short string of user provided input). The C++ `std::string` can be a good transport format for passing data into the threadpool. And of course you might have an addon that explicitly needs to do compute on strings. But the more common use case I've seen and that I think we should target is using `node::Buffer`s.

The reason is that, using `node::Buffer` we can pass data directly from JS into the C++ threadpool in a zero-copy way (no copies of the data inside the `node::Buffer` will need to be made to access it). We can't do this with `std::string` because we don't get a `std::string` from the JS side.

You might ask: why can't we use a `v8::String` (which is what we get from the JS side if a string is passed by the user)? Well, we could, but a `v8::String` still needs to be copied before entering the threadpool and reading out the data for this copy can be costly. This is because `v8::String` is implemented in a somewhat lazy (do work only when needed) kind of way in V8. For example, a given string might actually be a combination of a bunch of strings (aka a `ConsString` in v8), and to use its data and know the length v8 needs to "flatten" the string first. This takes time and allocations, while the `node::Buffer` is able to tell us its length and give us access to its raw data more efficiently. 

## Recommended changes

- [ ] Rework the async examples to take a `node::Buffer` object
- [ ] Add structures to safely pass its data into the threadpool
- [ ] Change the `do_expensive_work` to do work on the buffers data

The node::Buffer object can be safely used in the threadpool by:

 - Adding a `Nan::Persistent<v8::Object> buffer;` to the "baton" (aka the `class` that inherits from `Nan::AsyncWorker`). This prevents the buffer from being garbage collected while in use. This is needed because the JS code passing the buffer might not use the buffer again and then `v8` would not otherwise know to keep it alive.
 - Adding `const char * data;` to the "baton" which takes a pointer to the raw data inside the `node::Buffer`
 - Adding a `std::size_t data_length;` to the "baton" which stores a copy of the length (to be used to know its length later on, if needed).

_Note: reader may notice that passing both a `data` and a `data_length` is clumsy and the two could get out of sync. This is true. Ideally there were a nice container to store them both without copying the `data` like creating a `std::string` would. There is just this container in the upcoming C++17 implementation of `string_view` and until that happens [this class from protozero that can be used](https://github.com/mapbox/protozero/blob/5ad35fd13a067b3b3e113399dd44dbf986624749/include/protozero/types.hpp#L69-L199) or [boost string_ref](http://www.boost.org/doc/libs/1_59_0/libs/utility/doc/html/string_ref.html) for boost users._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhancement: pass data from node::Buffer into threadpool #67

Context

Improvements to skel

Recommended changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhancement: pass data from node::Buffer into threadpool #67

Description

Context

Improvements to skel

Recommended changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions