wip: crypto: use cppgc to manage Hash #51017

joyeecheung · 2023-12-02T21:15:14Z

While working on #40786 I happened to notice that the current BaseObject management comes with a significant overhead from global handle creation and incidentally migrating these objects to Oilpan gives us faster creation of objects (in my local testing, creating a cppgc-managed object is about 2.5x faster than creating a weak BaseObject).

This is only a WIP and not yet ready for review. It's mostly open as a reference for nodejs/performance#136 and for running benchmark CI. The primary blocker is that I'll need to figure out how to support embedder object book-keeping in the heap snapshot with cppgc (right now it doesn't work and embedder objects become missing in the heap snapshots) - EDIT: this is currently being worked on in https://chromium-review.googlesource.com/c/v8/v8/+/5630497

Refs: nodejs/performance#136
Refs: #40786

On macOS + M2 where OpenSSL 3 performs much better

                               confidence improvement accuracy (*)   (**)  (***)
crypto/create-hash.js n=100000        ***     16.29 %       ±1.83% ±2.44% ±3.18%

                                                                              confidence improvement accuracy (*)   (**)  (***)
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=10 sync='createHash'         ***      9.52 %       ±1.41% ±1.88% ±2.46%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=100 sync='createHash'        ***      9.96 %       ±1.71% ±2.27% ±2.97%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=20 sync='createHash'         ***      9.85 %       ±1.08% ±1.44% ±1.87%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=50 sync='createHash'         ***      9.80 %       ±1.91% ±2.55% ±3.35%

On Ubuntu + Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz, where OpenSSL 3 does not perform as well and becomes the bottleneck in digest computation:

                               confidence improvement accuracy (*)   (**)  (***)
crypto/create-hash.js n=100000        ***     17.17 %       ±1.53% ±2.03% ±2.65%

                                                                              confidence improvement accuracy (*)   (**)  (***)
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=10 sync='createHash'                  0.43 %       ±0.91% ±1.21% ±1.57%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=100 sync='createHash'                 0.55 %       ±0.98% ±1.31% ±1.70%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=20 sync='createHash'                  0.18 %       ±1.07% ±1.42% ±1.86%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=50 sync='createHash'                  1.19 %       ±1.71% ±2.28% ±2.98%

nodejs-github-bot · 2023-12-02T21:15:20Z

Review requested:

@nodejs/crypto
@nodejs/gyp

anonrig · 2023-12-02T21:23:56Z

src/cppgc_helpers.h

+
+namespace node {
+
+#define ASSIGN_OR_RETURN_UNWRAP_CPPGC(ptr, obj, ...)                           \


Would you mind adding a small documentation/comment around what cppgc is, some important links etc to this document?

Some documentation should probably go into src/README.md.

anonrig · 2023-12-02T21:24:15Z

src/cppgc_helpers.h

+  Environment* env_;                                                           \
+  v8::TracedReference<v8::Object> traced_reference_;
+
+class CppgcMixin {


Can you add a comment for the purpose of this class?

anonrig · 2023-12-02T21:24:58Z

src/crypto/crypto_hash.cc

  }

-  Hash* hash = new Hash(env, args.This());
+  Hash* hash = cppgc::MakeGarbageCollected<Hash>(


Would you mind adding a comment to here why we don't call new Hash but prefer the recommended change to here.

anonrig · 2023-12-02T21:25:50Z

src/crypto/crypto_util.h

+    ASSIGN_OR_RETURN_UNWRAP(&ctx, args.Holder());
+  } else {
+    ctx = CppgcMixin::Unwrap<T>(args.Holder());
+    if (ctx == nullptr) return;


In what scenario can ctx be nullptr?

joyeecheung · 2023-12-02T22:01:02Z

@anonrig

This is only a WIP and not yet ready for review.

I am mostly opening it to run the benchmark CI.

lemire · 2023-12-02T23:12:34Z

@joyeecheung I ran local benchmarks earlier today (using a commit you had added) and it did solve the performance issues for me. However, I did not investigate further and I wasn’t very careful.

However I got interested in your commit because it is attacking a performance bottleneck.

joyeecheung · 2023-12-02T23:59:40Z

Results from the CI https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1476/console

00:02:11                                                                                 confidence improvement accuracy (*)   (**)  (***)
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-1' data=10 sync='createHash'            **      7.92 %       ±5.59% ±7.44% ±9.68%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-1' data=10 sync='subtle'                        0.76 %       ±2.14% ±2.85% ±3.70%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-1' data=100 sync='createHash'          ***      9.56 %       ±3.62% ±4.82% ±6.29%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-1' data=100 sync='subtle'                       1.73 %       ±2.66% ±3.54% ±4.61%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-1' data=20 sync='createHash'           ***      8.95 %       ±4.04% ±5.38% ±7.02%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-1' data=20 sync='subtle'                        2.32 %       ±2.92% ±3.89% ±5.07%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-1' data=50 sync='createHash'             *      5.19 %       ±4.34% ±5.80% ±7.59%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-1' data=50 sync='subtle'                        2.05 %       ±2.71% ±3.61% ±4.70%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-256' data=10 sync='createHash'          **      7.96 %       ±4.75% ±6.32% ±8.24%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-256' data=10 sync='subtle'              **      3.90 %       ±2.37% ±3.15% ±4.11%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-256' data=100 sync='createHash'        ***      7.01 %       ±3.74% ±4.98% ±6.49%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-256' data=100 sync='subtle'                     0.79 %       ±2.86% ±3.81% ±4.96%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-256' data=20 sync='createHash'           *      4.94 %       ±3.82% ±5.11% ±6.68%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-256' data=20 sync='subtle'                     -0.97 %       ±2.59% ±3.46% ±4.51%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-256' data=50 sync='createHash'         ***      7.86 %       ±4.28% ±5.70% ±7.43%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-256' data=50 sync='subtle'                      1.59 %       ±2.62% ±3.49% ±4.54%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-384' data=10 sync='createHash'         ***      8.68 %       ±4.17% ±5.55% ±7.23%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-384' data=10 sync='subtle'                      2.92 %       ±3.18% ±4.23% ±5.50%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-384' data=100 sync='createHash'          *      3.08 %       ±3.05% ±4.08% ±5.37%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-384' data=100 sync='subtle'              *      3.04 %       ±2.55% ±3.40% ±4.42%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-384' data=20 sync='createHash'         ***      5.73 %       ±3.24% ±4.31% ±5.62%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-384' data=20 sync='subtle'               *      2.86 %       ±2.55% ±3.40% ±4.43%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-384' data=50 sync='createHash'         ***      6.76 %       ±3.47% ±4.62% ±6.01%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-384' data=50 sync='subtle'               *      3.42 %       ±2.74% ±3.65% ±4.75%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-512' data=10 sync='createHash'           *      5.22 %       ±4.30% ±5.72% ±7.46%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-512' data=10 sync='subtle'                      0.00 %       ±2.91% ±3.87% ±5.04%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-512' data=100 sync='createHash'         **      5.90 %       ±3.49% ±4.65% ±6.05%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-512' data=100 sync='subtle'                     0.78 %       ±2.71% ±3.61% ±4.70%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-512' data=20 sync='createHash'                  3.56 %       ±3.88% ±5.18% ±6.76%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-512' data=20 sync='subtle'                      1.67 %       ±3.01% ±4.01% ±5.21%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-512' data=50 sync='createHash'                  1.94 %       ±3.19% ±4.26% ±5.58%
00:02:11 crypto/webcrypto-digest.js n=100000 method='SHA-512' data=50 sync='subtle'                      2.49 %       ±2.87% ±3.82% ±4.98%

And https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1475/console - looks like n is too small for the CI machine and needs to be increased. It seems interesting how in the CI machine the digest benchmarks get a more stable speedup (though we also know that this largely depends on the CPU and how big of a bottleneck OpenSSL is on that CPU).

23:20:03                                confidence improvement accuracy (*)    (**)   (***)
23:20:03 crypto/create-hash.js n=100000                 4.18 %       ±7.75% ±10.31% ±13.42%

joyeecheung · 2024-03-07T18:31:37Z

Updated a bit to use cppgc::GarbageCollectedMixin and cppgc::NameProvider. Startup snapshot integration is still unsolved but I don't think it's needed for Hash. Though it would be nice to be able to customize the memory tracking so we can track OpenSSL memory in the heap snapshot. Still looking into it.

Somehow numbers are even better now on macOS + M2 Max

                               confidence improvement accuracy (*)   (**)  (***)
crypto/create-hash.js n=100000        ***     27.16 %       ±1.12% ±1.49% ±1.94%

                                                                                confidence improvement accuracy (*)   (**)  (***)
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=10 sync='createHash'           ***      9.44 %       ±1.53% ±2.04% ±2.66%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=100 sync='createHash'          ***      9.19 %       ±1.49% ±1.99% ±2.60%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=20 sync='createHash'           ***     11.12 %       ±3.46% ±4.65% ±6.13%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=50 sync='createHash'           ***      8.01 %       ±1.71% ±2.27% ±2.97%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=10 sync='createHash'         ***      9.66 %       ±2.89% ±3.86% ±5.05%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=100 sync='createHash'        ***      9.49 %       ±1.26% ±1.68% ±2.19%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=20 sync='createHash'         ***     11.40 %       ±3.15% ±4.23% ±5.59%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=50 sync='createHash'         ***      9.98 %       ±1.42% ±1.89% ±2.47%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=10 sync='createHash'         ***      8.42 %       ±1.44% ±1.91% ±2.49%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=100 sync='createHash'        ***      8.88 %       ±1.43% ±1.91% ±2.49%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=20 sync='createHash'         ***      8.12 %       ±0.94% ±1.26% ±1.63%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=50 sync='createHash'         ***      8.81 %       ±1.38% ±1.83% ±2.39%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=10 sync='createHash'         ***      7.36 %       ±2.96% ±3.96% ±5.20%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=100 sync='createHash'        ***      7.14 %       ±1.52% ±2.03% ±2.66%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=20 sync='createHash'         ***      7.70 %       ±3.24% ±4.35% ±5.76%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=50 sync='createHash'         ***      5.71 %       ±3.20% ±4.29% ±5.63%

joyeecheung · 2024-03-27T12:57:46Z

From discussions in https://docs.google.com/document/d/1ny2Qz_EsUnXGKJRGxoA-FXIE2xpLgaMAN6jD7eAkqFQ/edit looks like we need to upstream an API to V8 to track external memory in the heap snapshot. Looking into an API - it probably wouldn't be too different from what we already have in MemoryRetainer...

joyeecheung · 2024-06-13T18:04:32Z

Started working on the external memory tracking API: https://chromium-review.googlesource.com/c/v8/v8/+/5630497 it may need to take care of the GC tuning part too, from the discussions in the doc

joyeecheung · 2024-08-18T18:52:14Z

Rebased after v8 12.8 landed on main, and used updated version of https://chromium-review.googlesource.com/c/v8/v8/+/5630497 as well as the helpers in #52295 - the other PR will be ready after I finish the docs, whereas this PR depends on the mentioned V8 CL that needs to be hashed out (especially the GC scheduling integration part) since unlike ContextifyScript, crypto::Hash does have external memory, so this won't be unblocked until the V8 CL lands.

With the WIP V8 CL the heap snapshot after migration looks like this:

(Currently cppgc-managed objects cannot specify edge names in the heap snapshot, that is being worked on in https://docs.google.com/document/d/1PQQHhT0MLlStoiqNmji2-GcX62xtAsXPrihXi403ib4/edit#heading=h.n1atlriavj6v)

joyeecheung · 2025-02-14T17:03:08Z

Following the discussions in #56534 it's probably worth a retry to be based on the approach designed there instead.

joyeecheung · 2025-03-07T16:01:41Z

Updated #56534 to deduplicate the memory retainer node and added a test. The heap snapshot locally looks like this:

joyeecheung · 2025-05-27T14:44:13Z

I rebased after #56534 and I noticed that:

With the extra list on the side, there is a performance regression, so that the benchmarks no longer show improvement once there is a list (similar to what it was before)
And, with the extra list using weak persistents, there is another performance regression, so that e.g. we get this now

                               confidence improvement accuracy (*)   (**)  (***)
crypto/create-hash.js n=100000         **     -5.71 %       ±4.21% ±5.67% ±7.49%

But if I remove the list the performance improvement is still there, so the regression specifically comes from the list we use for heap snapshot annotations.

I am wondering if we should, then, make the external tracking opt-in.

When the users do need external memory tracking for heap snapshots, they could pass a CLI flag like --track-external-memory or call a JS API v8.startTrackingExternal, at an overhead to the memory management for better annotations in the heap snapshots.
When the users do not need diagnostics (most cases), we don't track the external memory and keep the performance boost from cppgc management. In addition we could also use v8::ExternalMemoryAccounter for better coordination with V8 GC (this is scheduling only, not for diagnostics).

jasnell · 2025-05-27T15:04:02Z

When the users do need external memory tracking for heap snapshots, they could pass a CLI flag like --track-external-memory or call a JS API v8.startTrackingExternal, at an overhead to the memory management for better annotations in the heap snapshots.

I may be misunderstanding but isn't the list also for realm cleanup and not just tracking the external memory? Don't we still need the list there to ensure that Realm cleanup happens correctly? Or am I missing something? (I'm most likely missing something ;-) ...).

That said, I think I would actually prefer the opposite... that is, have the tracking on by default with a flag to disable it. That makes things easier for doing diagnostics in local dev and then deployments to production can disable the tracking to boost performance.

When the users do not need diagnostics (most cases), we don't track the external memory and keep the performance boost from cppgc management. In addition we could also use v8::ExternalMemoryAccounter for better coordination with V8 GC (this is scheduling only, not for diagnostics).

We started using the v8::ExternalMemoryAccounter in workers and noticed that it had a non-trivial overhead also. Necessary, but still has a cost.

jasnell · 2025-05-27T15:08:12Z

src/crypto/crypto_util.h

                             const char*, size_t)) {
  T* ctx;
-  ASSIGN_OR_RETURN_UNWRAP(&ctx, args.This());
+  if constexpr (std::is_base_of_v<BaseObject, T>) {


This pattern is likely to become fairly common assuming we get more use of the new mixins... likely a good idea to separate this unwrapping of T* out into a separate utility method.

joyeecheung · 2025-05-27T15:19:13Z

I may be misunderstanding but isn't the list also for realm cleanup and not just tracking the external memory? Don't we still need the list there to ensure that Realm cleanup happens correctly? Or am I missing something? (I'm most likely missing something ;-) ...).

For that we can make it opt-in, so that objects that don't need special cleanup don't put themselves in the list. It remains to be seen whether this or PreFinalizer would be faster (I think the weak-persistent-based approach might still be faster since it's more targeted), but we could use either of them.

That makes things easier for doing diagnostics in local dev and then deployments to production can disable the tracking to boost performance.

I think the question is, how many people are actually going to need external memory diagnostics for heap snapshots? Note that cppgc objects are still going to show up in heap snapshots, it's just the external memory part (e.g. they are going to see Node / Hash, just not Node / EVP_MD_CTX). I suspect that would only be <1%, and they are likely to proactively research/ask how to get more details when they do need more details. On the other hand, for the other 99% who use these objects in production, we will be effectively asking to them to modify their scripts to include the new option to get optimal performance, provided that they are aware that they are being affected (the more likely scenario, I think, is that they won't know and will just take the perf hit without ever knowing this one trick to make it optimal).

Also, for use cases like CLI tools, this just always affects local performance as there's no production environment. So the users need to be aware of this option if they want better perf of their CLI tools run locally, at that point we are likely to be asked why we are not disabling it by default.

jasnell · 2025-05-27T15:29:17Z

That's fair. Hmm... then yeah, an option to opt-in would be good. I think in parallel it would be good to try to see if we can cut down that performance penalty when using the list tho as I do believe it's better to have that additional detail tracked.

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Dec 2, 2023

anonrig reviewed Dec 2, 2023

View reviewed changes

joyeecheung mentioned this pull request Dec 2, 2023

Optimizing hashing performance nodejs/performance#136

Closed

joyeecheung mentioned this pull request Dec 2, 2023

crypto: improve OpenSSL initialization #51019

Closed

joyeecheung mentioned this pull request Dec 3, 2023

Usage of Oilpan / V8 C++ Garbage Collector #40786

Open

tniessen added the wip Issues and PRs that are still a work in progress. label Dec 3, 2023

joyeecheung force-pushed the cppgc-hash branch from b11499c to 86862d9 Compare March 7, 2024 18:24

joyeecheung force-pushed the cppgc-hash branch from 86862d9 to 0067a0b Compare August 18, 2024 18:44

joyeecheung added the blocked PRs that are blocked by other issues or PRs. label Aug 18, 2024

joyeecheung mentioned this pull request Jan 14, 2025

MD5 SIMD #56558

Closed

joyeecheung force-pushed the cppgc-hash branch from 0067a0b to c285e02 Compare February 14, 2025 19:37

joyeecheung mentioned this pull request Feb 14, 2025

src: track cppgc wrappers with a list in Realm #56534

Merged

joyeecheung force-pushed the cppgc-hash branch 2 times, most recently from 3094b35 to cc49694 Compare March 7, 2025 16:00

joyeecheung force-pushed the cppgc-hash branch 2 times, most recently from 9ef7a8b to b405688 Compare March 7, 2025 16:57

crypto: use cppgc to manage Hash

465f903

joyeecheung force-pushed the cppgc-hash branch from b405688 to 465f903 Compare May 22, 2025 14:50

jasnell reviewed May 27, 2025

View reviewed changes


		namespace node {

		#define ASSIGN_OR_RETURN_UNWRAP_CPPGC(ptr, obj, ...) \

Uh oh!

wip: crypto: use cppgc to manage Hash #51017

Are you sure you want to change the base?

wip: crypto: use cppgc to manage Hash #51017

Uh oh!

Conversation

joyeecheung commented Dec 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nodejs-github-bot commented Dec 2, 2023

Uh oh!

anonrig Dec 2, 2023

Choose a reason for hiding this comment

Uh oh!

tniessen Dec 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anonrig Dec 2, 2023

Choose a reason for hiding this comment

Uh oh!

anonrig Dec 2, 2023

Choose a reason for hiding this comment

Uh oh!

anonrig Dec 2, 2023

Choose a reason for hiding this comment

Uh oh!

joyeecheung commented Dec 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lemire commented Dec 2, 2023

Uh oh!

joyeecheung commented Dec 2, 2023

Uh oh!

joyeecheung commented Mar 7, 2024

Uh oh!

joyeecheung commented Mar 27, 2024

Uh oh!

joyeecheung commented Jun 13, 2024

Uh oh!

joyeecheung commented Aug 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joyeecheung commented Feb 14, 2025

Uh oh!

joyeecheung commented Mar 7, 2025

Uh oh!

joyeecheung commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasnell commented May 27, 2025

Uh oh!

jasnell May 27, 2025

Choose a reason for hiding this comment

Uh oh!

joyeecheung commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasnell commented May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

joyeecheung commented Dec 2, 2023 •

edited

Loading

tniessen Dec 3, 2023 •

edited

Loading

joyeecheung commented Dec 2, 2023 •

edited

Loading

joyeecheung commented Aug 18, 2024 •

edited

Loading

joyeecheung commented May 27, 2025 •

edited

Loading

joyeecheung commented May 27, 2025 •

edited

Loading