-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
wip: crypto: use cppgc to manage Hash #51017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Review requested:
|
src/cppgc_helpers.h
Outdated
|
||
namespace node { | ||
|
||
#define ASSIGN_OR_RETURN_UNWRAP_CPPGC(ptr, obj, ...) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a small documentation/comment around what cppgc is, some important links etc to this document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some documentation should probably go into src/README.md
.
src/cppgc_helpers.h
Outdated
Environment* env_; \ | ||
v8::TracedReference<v8::Object> traced_reference_; | ||
|
||
class CppgcMixin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment for the purpose of this class?
@@ -103,7 +109,8 @@ void Hash::New(const FunctionCallbackInfo<Value>& args) { | |||
xof_md_len = Just<unsigned int>(args[1].As<Uint32>()->Value()); | |||
} | |||
|
|||
Hash* hash = new Hash(env, args.This()); | |||
Hash* hash = cppgc::MakeGarbageCollected<Hash>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a comment to here why we don't call new Hash but prefer the recommended change to here.
ASSIGN_OR_RETURN_UNWRAP(&ctx, args.Holder()); | ||
} else { | ||
ctx = CppgcMixin::Unwrap<T>(args.Holder()); | ||
if (ctx == nullptr) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what scenario can ctx be nullptr?
I am mostly opening it to run the benchmark CI. |
@joyeecheung I ran local benchmarks earlier today (using a commit you had added) and it did solve the performance issues for me. However, I did not investigate further and I wasn’t very careful. However I got interested in your commit because it is attacking a performance bottleneck. |
Results from the CI https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1476/console
And https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1475/console - looks like n is too small for the CI machine and needs to be increased. It seems interesting how in the CI machine the digest benchmarks get a more stable speedup (though we also know that this largely depends on the CPU and how big of a bottleneck OpenSSL is on that CPU).
|
Updated a bit to use cppgc::GarbageCollectedMixin and cppgc::NameProvider. Startup snapshot integration is still unsolved but I don't think it's needed for Hash. Though it would be nice to be able to customize the memory tracking so we can track OpenSSL memory in the heap snapshot. Still looking into it. Somehow numbers are even better now on macOS + M2 Max
|
From discussions in https://docs.google.com/document/d/1ny2Qz_EsUnXGKJRGxoA-FXIE2xpLgaMAN6jD7eAkqFQ/edit looks like we need to upstream an API to V8 to track external memory in the heap snapshot. Looking into an API - it probably wouldn't be too different from what we already have in |
Started working on the external memory tracking API: https://chromium-review.googlesource.com/c/v8/v8/+/5630497 it may need to take care of the GC tuning part too, from the discussions in the doc |
86862d9
to
0067a0b
Compare
Rebased after v8 12.8 landed on main, and used updated version of https://chromium-review.googlesource.com/c/v8/v8/+/5630497 as well as the helpers in #52295 - the other PR will be ready after I finish the docs, whereas this PR depends on the mentioned V8 CL that needs to be hashed out (especially the GC scheduling integration part) since unlike ContextifyScript, crypto::Hash does have external memory, so this won't be unblocked until the V8 CL lands. With the WIP V8 CL the heap snapshot after migration looks like this: ![]() (Currently cppgc-managed objects cannot specify edge names in the heap snapshot, that is being worked on in https://docs.google.com/document/d/1PQQHhT0MLlStoiqNmji2-GcX62xtAsXPrihXi403ib4/edit#heading=h.n1atlriavj6v) |
Following the discussions in #56534 it's probably worth a retry to be based on the approach designed there instead. |
0067a0b
to
c285e02
Compare
3094b35
to
cc49694
Compare
Updated #56534 to deduplicate the memory retainer node and added a test. The heap snapshot locally looks like this: ![]() |
9ef7a8b
to
b405688
Compare
I rebased after #56534 and I noticed that:
But if I remove the list the performance improvement is still there, so the regression specifically comes from the list we use for heap snapshot annotations. I am wondering if we should, then, make the external tracking opt-in.
|
I may be misunderstanding but isn't the list also for realm cleanup and not just tracking the external memory? Don't we still need the list there to ensure that Realm cleanup happens correctly? Or am I missing something? (I'm most likely missing something ;-) ...). That said, I think I would actually prefer the opposite... that is, have the tracking on by default with a flag to disable it. That makes things easier for doing diagnostics in local dev and then deployments to production can disable the tracking to boost performance.
We started using the |
@@ -57,7 +58,12 @@ void Decode(const v8::FunctionCallbackInfo<v8::Value>& args, | |||
void (*callback)(T*, const v8::FunctionCallbackInfo<v8::Value>&, | |||
const char*, size_t)) { | |||
T* ctx; | |||
ASSIGN_OR_RETURN_UNWRAP(&ctx, args.This()); | |||
if constexpr (std::is_base_of_v<BaseObject, T>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern is likely to become fairly common assuming we get more use of the new mixins... likely a good idea to separate this unwrapping of T*
out into a separate utility method.
For that we can make it opt-in, so that objects that don't need special cleanup don't put themselves in the list. It remains to be seen whether this or PreFinalizer would be faster (I think the weak-persistent-based approach might still be faster since it's more targeted), but we could use either of them.
I think the question is, how many people are actually going to need external memory diagnostics for heap snapshots? Note that cppgc objects are still going to show up in heap snapshots, it's just the external memory part (e.g. they are going to see Also, for use cases like CLI tools, this just always affects local performance as there's no production environment. So the users need to be aware of this option if they want better perf of their CLI tools run locally, at that point we are likely to be asked why we are not disabling it by default. |
That's fair. Hmm... then yeah, an option to opt-in would be good. I think in parallel it would be good to try to see if we can cut down that performance penalty when using the list tho as I do believe it's better to have that additional detail tracked. |
While working on #40786 I happened to notice that the current BaseObject management comes with a significant overhead from global handle creation and incidentally migrating these objects to Oilpan gives us faster creation of objects (in my local testing, creating a cppgc-managed object is about 2.5x faster than creating a weak BaseObject).
This is only a WIP and not yet ready for review. It's mostly open as a reference for nodejs/performance#136 and for running benchmark CI. The primary blocker is that I'll need to figure out how to support embedder object book-keeping in the heap snapshot with cppgc (right now it doesn't work and embedder objects become missing in the heap snapshots) - EDIT: this is currently being worked on in https://chromium-review.googlesource.com/c/v8/v8/+/5630497
Refs: nodejs/performance#136
Refs: #40786
On macOS + M2 where OpenSSL 3 performs much better
On Ubuntu + Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz, where OpenSSL 3 does not perform as well and becomes the bottleneck in digest computation: