crypto hash, hmac and cipher ops could be 2x-3x faster for "one-shot" small buffers #26748
Description
@ronomon/crypto-async recently added support for synchronous hash, hmac and cipher methods.
In the process, we noticed that Node's crypto equivalents are significantly slower than expected, partly because of calling back and forth into C++ multiple times, i.e. for initialization, updating and finalizing.
These calls add up to considerable overhead, on the order of a few hundred ns per call, which is especially noticeable for small buffers less than 1KB.
For use-cases which only need to hash a single small buffer, i.e. in "one shot", it should be possible to improve performance by 2x or even by as much as 3x, by making a single call into C++ (and by removing other overhead, more on this below).
Of course, not everyone calls update()
only once, or hashes or encrypts only small buffers, but I think this is probably representative of a large proportion of use-cases, especially for hashing:
crypto.createHash('sha256').update(buffer).digest()
If we could make this use-case 2x to 3x faster (as in less latency) that should be worth doing.
How could this be done?
I think there are at least two approaches:
-
Firstly, while the current interface must support the existing streams interface, Node could possibly do something transparently under the hood, at least for hashes, by batching calls to C++ for initialize, update and finalize. For example, for a hash, there's no need for Node to do anything when the hash is instantiated (except for error handling, checking for algorithm support etc), and no need to do anything for updates. Node could effectively wait until
update()
is called a second time before calling into C++, or untildigest()
is called. More generally, buffers could be batched until a high-watermark is reached, to amortize calls into C++. For ciphers, of course, this won't be possible, because the user expectsupdate()
to return something (and we need to support AEAD ciphers which are more complex in terms of interface). This approach won't entirely close the gap with @ronomon/crypto-async, because of the overhead of streams for small buffers. -
Secondly, it might be simpler and more optimal to introduce lightweight one-shot methods, to avoid the expense of streams for small buffers, which is significantly costly for small buffers, in addition to the multiple round-trips into C++. For example:
hash(algorithm, buffer, [offset, size], [callback])
. Introducing one-shot methods would also be a natural opportunity to add async multi-core support for large buffers (> 64KB), which would give a huge concurrent throughput boost for large buffers, and eliminate blocking in the event loop.
These are just some ideas, in case anyone is interested to run with this.
Activity