esm: add `deregister` method #49159

izaakschroeder · 2023-08-14T06:52:30Z

TBD

refactor hooks internals – remove chains in favor of loaderInstances; there is a tiny performance penalty when going through any given chain (since there's a conditional branch now and possibly extra iterations up to the total number of loaders) but i think this is negligible compared to the usability increase – there is now a default loader called default_loader that is registered like any other as node's internal one and only one array instead of several. the logic for adding/removing loaders is significantly simpler now.
change the return type of register – previously it returned whatever initialize did. @aduh95 noted that we should return a "loader handle" or a symbol to the loader instead of the initialize result which is what i've done here. i managed to make returning a symbol work across the thread gap. i am not 100% sold on symbol vs some kind of loader object which was also suggested (a thing that had a deregister method); i am still concerned about loader authors being able to pass initialization data back if it's somehow useful to them. N.B. this seems kind of like an 11th hour change now that 20.6.0 is imminent and the loaders API is being marked somewhat stable.
allow hook methods to receive a fourth state argument – so instead of returning the result of initialize and discard it, we now keep the result attached to the loader instance. this value is then fed back to hook invocations. after thinking about the comments in Replace globalPreload loaders#147 this actually seems necessary if there can be an arbitrary number of "copies" of any given loader; how can loaders differentiate their internal state from one another? there isn't a good way without something like this.
add deregister – this builds on the first two changes; the first one makes it sensible/trivial to remove existing loaders, and the second makes it possible for the user to actually be able to reference an individual loader which is necessary for removal

Adds deregister method on node:module that looks like this:

type Deregister = (id: string) => boolean;

Modifies the initialize hook to look like this:

type Initialize = (data: any, meta: {id: opaque}) => Promise<any>;

Internally registered instances of hooks are now tracked. This is so they can be removed later. The id of the registered instance is now passed to the initialize hook which can then be passed back to the caller of register.

// Loader
export const initialize = (_data, meta) => {
  return meta.id;
}

// Caller
import {register, deregister} from "node:module";

const id = register(...);

// ...

deregister(id);

nodejs-github-bot · 2023-08-14T06:52:35Z

Review requested:

@nodejs/loaders
@nodejs/modules

ljharb · 2023-08-14T06:54:16Z

lib/internal/modules/esm/hooks.js

+  #removeFromChain(chain, target) {
+    for (let i = 0; i < chain.length; ++i) {
+      if (target === chain[i+1]?.fn) {
+        chain.splice(i+1, 1);


Suggested change

chain.splice(i+1, 1);

ArrayPrototypeSplice(chain, i + 1, 1);

ljharb · 2023-08-14T06:54:48Z

lib/internal/modules/esm/hooks.js

+        return x.fn === instance.globalPreload;
+      });
+      if (index >= 0) {
+        this.#chains.globalPreload.splice(index, 1);


Suggested change

this.#chains.globalPreload.splice(index, 1);

ArrayPrototypeSplice(this.#chains.globalPreload, index, 1);

ljharb · 2023-08-14T06:54:58Z

lib/internal/modules/esm/hooks.js

+    if (instance.load) {
+      this.#removeFromChain(this.#chains.load, instance.load);
+    }
+    this.#loaderInstances.splice(index, 1);


Suggested change

this.#loaderInstances.splice(index, 1);

ArrayPrototypeSplice(this.#loaderInstances, index, 1);

test/es-module/test-esm-loader-hooks.mjs

aduh95 · 2023-08-14T07:01:50Z

We would need some docs.

Modifies the initialize hook to look like this:
type Initialize = (data: any, meta: {id: opaque}) => Promise<any>;

IMO register() should always return the ID, that eliminates the problem we had that there was no way to specify a transferList.

test/es-module/test-esm-loader-hooks.mjs

anonrig · 2023-08-14T15:11:35Z

lib/internal/modules/esm/hooks.js

+    const index = ArrayPrototypeFindIndex(this.#loaderInstances, (target) => {
+      return target.id === id;
+    });


Can we use a different data structure that doesn't require us to traverse through the whole array to find what we are looking for?

anonrig · 2023-08-14T15:12:24Z

lib/internal/modules/esm/hooks.js

+    if (index < 0) {
+      return false;
+    }
+    const instance = this.#loaderInstances[index];


Since we are using index to just get the element, we can use ArrayPrototypeFind instead of index.

anonrig · 2023-08-14T15:12:41Z

lib/internal/modules/esm/hooks.js

+      const index = ArrayPrototypeFindIndex(this.#chains.globalPreload, (x) => {
+        return x.fn === instance.globalPreload;
+      });
+      if (index >= 0) {


Suggested change

if (index >= 0) {

if (index !== -1) {

lib/internal/modules/esm/loader.js

joyeecheung · 2023-08-15T16:34:34Z

While deregister might be a correct word, I can't seem to find any use of it in the code base, there are many references to unregister in the code base though, so switching to unregister would make it more consistent.

GeoffreyBooth · 2023-08-15T17:00:10Z

While deregister might be a correct word, I can’t seem to find any use of it in the code base, there are many references to unregister in the code base though, so switching to unregister would make it more consistent.

https://english.stackexchange.com/a/40095/250362

TLDR: “deregister” is the verb, “unregistered” is the state once it’s been deregistered. So “deregister” is correct in this case, because the API is the action being taken.

GeoffreyBooth · 2023-08-15T17:15:32Z

IMO register() should always return the ID, that eliminates the problem we had that there was no way to specify a transferList.

What is the current signature for register? I don’t see it in https://github.com/nodejs/node/blob/main/doc/api/module.md#moduleregister. We should add that (either in its own PR or I would be fine with it getting added in this one.)

Assuming it’s currently sync and has no return value, I agree with @joyeecheung that we should change its return value to be the ID, similar to setTimeout in browsers. But we need to ensure that register remains sync, so that it stays easy to call from CommonJS.

ljharb · 2023-08-15T17:42:34Z

https://english.stackexchange.com/questions/25931/unregister-vs-deregister/40095#comment1140785_40095 states that in the verb form, both prefixes are synonyms.

JakobJingleheimer · 2023-08-15T18:24:39Z

I think deregister and unregister are similar enough that if there's already precedent, we should go with precedent. Otherwise, I believe deregister is more correct: there are more forms of the word based on de- (ex deregistration not unregistration). Also, very nuanced: un- generally has the sense of "revert" (like undo), whereas de- generally has the sense of "remove" and "cease" (like decelerate, deflate, desist).

ljharb · 2023-08-15T18:51:21Z

Right, and in this case it's reverting/undoing a registration :-)

GeoffreyBooth · 2023-08-15T19:21:02Z

if there’s already precedent, we should go with precedent

As far as I can tell the only precedent is internal. We have no public-facing APIs that are called “unregister,” per https://www.google.com/search?q=site%3Anodejs.org+%22unregister%22. I feel pretty strongly that “deregister” is correct English, so the public API should have that name, and we can clean up/rename the internal APIs whenever we get a chance. If you read the full page at https://english.stackexchange.com/a/40095 it seems pretty clear that most people consider “deregister” more correct, with some arguing that they’re equivalent, but no one thinks that “unregister” is more correct English.

This is a bikeshed and wasting attention. We already have a name. If you want to discuss it further please open a separate issue.

ljharb · 2023-08-15T19:23:40Z

The web and the language disagree with you:

joyeecheung · 2023-08-15T19:24:30Z

We have no public-facing APIs that are called “unregister,”

UnregisterIsolate() is part of the public embedder/addon API in node.h though (e.g. if we rename it, we break electron), and it's used in a few places in the API docs. (and as @ljharb mentioned, it's also present in some of the Web API/JS APIs, which you can also see being used in lib/ and test/ ...)

JakobJingleheimer · 2023-08-15T18:44:38Z

lib/internal/modules/esm/hooks.js

@@ -160,6 +168,15 @@ class Hooks {
      load,
    } = pluckHooks(exports);

+    const instance = {


I think this is maybe not a good name: instance usually means instance of a class, but here I think it's being used as "occurrence".

Suggested change

const instance = {

const registrationId = //

I would use something like uid(3) (it looks like crypto.randomUUID() is the only thing already available in node, but that would be massively overkill) or perhaps the loader's url. Chatting with Geoffrey, he pointed out a loader can now be registered multiple times, so if the loader's url, it would need to have something else appended, like a generation id.

I think we should use symbols for the public facing API, and the index for the internal API; using random string or number would be a mistake.

Ooow! That's a good idea!

What’s the value of a symbol as opposed to a string like file:///path/to/loader.js?1? At least the latter provides some useful information, such as for debugging, and then we don’t need to have different data structures internally and externally.

A symbol is guaranteed to be unique, so then there is no concern for a generational id on the same url:

Symbol(url) ≠ Symbol(url)

Also, the label of a symbol is visible, so it still provides almost identical debugging value 🙂

And bonus: it precludes the possibility of a user getting overly clever and foot-gunning a deregistration.

We could also return a handle, like:

const handle = register(...) handle.deregister()

Then the handle could potentially get other methods in the future. It could also contain the return value of initialize as a property, like handle.initializationResult.

Not sure I prefer this over other options, just adding it to the mix. We still need to deal with the problem that register currently returns the return value of initialize, so either we add this symbol into initialize for the user to return, and then somehow we get symbols across the thread barrier; or we change the return signature of register.

JakobJingleheimer · 2023-08-15T19:03:42Z

lib/internal/modules/esm/hooks.js

+  #lastInstanceId = 0;
+  #loaderInstances = [];


I think we should not use a simple number like this for id. Instead, we should use a Map and some string-friendly id (because maps are significantly more efficient with string-based indexes than numeric that gets cast back and forth). It also makes mutations far more efficient, as well as being inherently better on memory (because an array must be pre-allocated space, whereas a Map takes up only exactly the amount of space it needs).

Suggested change

#lastInstanceId = 0;

#loaderInstances = [];

#customLoaders = new SafeMap();

We need to store the order of the loaders, a Map doesn't seem adequate. Also, it's the first time I hear that Map would be more memory efficient than arrays, I'd be very surprised if that was the case.

Maps are ordered, but I agree

Yes, the sequence of Map items is guaranteed (insertion order): MDN: Map

The Map object holds key-value pairs and remembers the original insertion order of the keys.

Also, it's the first time I hear that Map would be more memory efficient than arrays, I'd be very surprised if that was the case.

Maps are better than Arrays in terms of both space and time efficiency:

Arrays are pre-sized. V8 pre-allocates a rather significant size¹ for an array in order to avoid having to move it (because the entire array is stored sequentially in memory) which is a very expensive operation.

Maps are not stored sequentially in memory, so there is no "move" operation when they grow (nor is there a pre-allocation of space).

¹ unless a size is specifically provided like new Array(2) (then the array will be exactly sized); however this can be a de-op due to data type of the items themselves (under the hood, V8 can use a better-suited type of array, but it doesn't do that optimisation when you strong-arm it, so unless the items in the array are all the same data-type, generally a primitive, it's better to not force a size).

Sets are similarly better for space and time efficiency (and are also not stored sequentially in memory).

Also worth noting: Maps and Sets are significantly better for rapid mutation than Arrays, and that's exactly what we expect to happen here.

Also worth noting: Maps and Sets are significantly better for rapid mutation than Arrays, and that's exactly what we expect to happen here.

I don't think we expect that, I would expect that unregister to not be useful to most applications – I've never seen anyone request it, but maybe there are some use cases I'm not seeing.

There are 2 sides to this coin ;) deletions from de/un-register, but primarily additions

Sets are similarly better for space and time efficiency (and are also not stored sequentially in memory).

Also worth noting: Maps and Sets are significantly better for rapid mutation than Arrays, and that's exactly what we expect to happen here.

How did you come to that conclusion? Is there a way to check if this is true?

From my understanding, and with some pointers from @targos, I don't see how this could be true.

In V8, Set and Map instances have a hash table associated with them which seems to be stored sequentially in memory:

node/deps/v8/src/builtins/builtins-collections-gen.cc

Line 1736 in f226350

TF_BUILTIN(SetPrototypeAdd, CollectionsBuiltinsAssembler) {

node/deps/v8/src/builtins/builtins-collections-gen.cc

Lines 1789 to 1791 in f226350

// We do not have enough space, grow the table and reload the relevant

// fields.

CallRuntime(Runtime::kSetGrow, context, receiver);

node/deps/v8/src/runtime/runtime-collections.cc

Lines 20 to 26 in edd537c

RUNTIME_FUNCTION(Runtime_SetGrow) {

HandleScope scope(isolate);

DCHECK_EQ(1, args.length());

Handle<JSSet> holder = args.at<JSSet>(0);

Handle<OrderedHashSet> table(OrderedHashSet::cast(holder->table()), isolate);

MaybeHandle<OrderedHashSet> table_candidate =

OrderedHashSet::EnsureGrowable(isolate, table);

node/deps/v8/src/objects/ordered-hash-table.cc

Lines 69 to 93 in f226350

template <class Derived, int entrysize>

MaybeHandle<Derived> OrderedHashTable<Derived, entrysize>::EnsureGrowable(

Isolate* isolate, Handle<Derived> table) {

DCHECK(!table->IsObsolete());

int nof = table->NumberOfElements();

int nod = table->NumberOfDeletedElements();

int capacity = table->Capacity();

if ((nof + nod) < capacity) return table;

int new_capacity;

if (capacity == 0) {

// step from empty to minimum proper size

new_capacity = kInitialCapacity;

} else if (nod >= (capacity >> 1)) {

// Don't need to grow if we can simply clear out deleted entries instead.

// Note that we can't compact in place, though, so we always allocate

// a new table.

new_capacity = capacity;

} else {

new_capacity = capacity << 1;

}

return Derived::Rehash(isolate, table, new_capacity);

}

From the above, I understand the Set or Map hash table capacity is doubled everytime it reaches its capacity, so not so memory efficient. Now that Rehash function has to recompute the hash for each elements for the new table:

node/deps/v8/src/objects/ordered-hash-table.cc

Lines 259 to 261 in f226350

template <class Derived, int entrysize>

MaybeHandle<Derived> OrderedHashTable<Derived, entrysize>::Rehash(

Isolate* isolate, Handle<Derived> table, int new_capacity) {

node/deps/v8/src/objects/ordered-hash-table.cc

Lines 278 to 298 in f226350

for (InternalIndex old_entry : table->IterateEntries()) {

int old_entry_raw = old_entry.as_int();

Object key = table->KeyAt(old_entry);

if (key.IsTheHole(isolate)) {

table->SetRemovedIndexAt(removed_holes_index++, old_entry_raw);

continue;

}

Object hash = key.GetHash();

int bucket = Smi::ToInt(hash) & (new_buckets - 1);

Object chain_entry = new_table->get(HashTableStartIndex() + bucket);

new_table->set(HashTableStartIndex() + bucket, Smi::FromInt(new_entry));

int new_index = new_table->EntryToIndexRaw(new_entry);

int old_index = table->EntryToIndexRaw(old_entry_raw);

for (int i = 0; i < entrysize; ++i) {

Object value = table->get(old_index + i);

new_table->set(new_index + i, value);

}

new_table->set(new_index + kChainOffset, chain_entry);

++new_entry;

}

In comparison, %Array.prototype.push% has a fast path, and seems also simpler (because no hashing involved), and also has optimizations based on what type of object you add to the array. The object path:

node/deps/v8/src/builtins/builtins-array-gen.cc

Line 340 in f226350

TF_BUILTIN(ArrayPrototypePush, CodeStubAssembler) {

node/deps/v8/src/builtins/builtins-array-gen.cc

Lines 362 to 374 in f226350

BIND(&fast);

{

array_receiver = CAST(receiver);

arg_index = IntPtrConstant(0);

kind = EnsureArrayPushable(context, LoadMap(array_receiver), &runtime);

GotoIf(IsElementsKindGreaterThan(kind, HOLEY_SMI_ELEMENTS),

&object_push_pre);

TNode<Smi> new_length =

BuildAppendJSArray(PACKED_SMI_ELEMENTS, array_receiver, &args,

&arg_index, &smi_transition);

args.PopAndReturn(new_length);

}

node/deps/v8/src/builtins/builtins-array-gen.cc

Lines 405 to 410 in f226350

BIND(&object_push);

{

TNode<Smi> new_length = BuildAppendJSArray(

PACKED_ELEMENTS, array_receiver, &args, &arg_index, &default_label);

args.PopAndReturn(new_length);

}

node/deps/v8/src/codegen/code-stub-assembler.cc

Lines 3446 to 3450 in f226350

TNode<Smi> CodeStubAssembler::BuildAppendJSArray(ElementsKind kind,

TNode<JSArray> array,

CodeStubArguments* args,

TVariable<IntPtrT>* arg_index,

Label* bailout) {

node/deps/v8/src/codegen/code-stub-assembler.cc

Lines 3458 to 3463 in f226350

// Resize the capacity of the fixed array if it doesn't fit.

TNode<IntPtrT> first = arg_index->value();

TNode<BInt> growth =

IntPtrToBInt(IntPtrSub(args->GetLengthWithoutReceiver(), first));

PossiblyGrowElementsCapacity(kind, array, var_length.value(), &var_elements,

growth, &pre_bailout);

node/deps/v8/src/codegen/code-stub-assembler.cc

Lines 3429 to 3444 in f226350

void CodeStubAssembler::PossiblyGrowElementsCapacity(

ElementsKind kind, TNode<HeapObject> array, TNode<BInt> length,

TVariable<FixedArrayBase>* var_elements, TNode<BInt> growth,

Label* bailout) {

Label fits(this, var_elements);

TNode<BInt> capacity =

TaggedToParameter<BInt>(LoadFixedArrayBaseLength(var_elements->value()));

TNode<BInt> new_length = IntPtrOrSmiAdd(growth, length);

GotoIfNot(IntPtrOrSmiGreaterThan(new_length, capacity), &fits);

TNode<BInt> new_capacity = CalculateNewElementsCapacity(new_length);

*var_elements = GrowElementsCapacity(array, var_elements->value(), kind, kind,

capacity, new_capacity, bailout);

Goto(&fits);

BIND(&fits);

}

node/deps/v8/src/codegen/code-stub-assembler.cc

Lines 5594 to 5598 in f226350

template <typename TIndex>

TNode<FixedArrayBase> CodeStubAssembler::GrowElementsCapacity(

TNode<HeapObject> object, TNode<FixedArrayBase> elements,

ElementsKind from_kind, ElementsKind to_kind, TNode<TIndex> capacity,

TNode<TIndex> new_capacity, Label* bailout) {

node/deps/v8/src/codegen/code-stub-assembler.cc

Lines 5612 to 5620 in f226350

// Allocate the new backing store.

TNode<FixedArrayBase> new_elements =

AllocateFixedArray(to_kind, new_capacity);

// Copy the elements from the old elements store to the new.

// The size-check above guarantees that the |new_elements| is allocated

// in new space so we can skip the write barrier.

CopyFixedArrayElements(from_kind, elements, to_kind, new_elements, capacity,

new_capacity, SKIP_WRITE_BARRIER);

So if the preallocated space is not enough, the algorithm to copy the array elements is a bit tedious, so I'll stop there:

node/deps/v8/src/codegen/code-stub-assembler.cc

Lines 5249 to 5255 in f226350

template <typename TIndex>

void CodeStubAssembler::CopyFixedArrayElements(

ElementsKind from_kind, TNode<FixedArrayBase> from_array,

ElementsKind to_kind, TNode<FixedArrayBase> to_array,

TNode<TIndex> first_element, TNode<TIndex> element_count,

TNode<TIndex> capacity, WriteBarrierMode barrier_mode,

HoleConversionMode convert_holes, TVariable<BoolT>* var_holes_converted) {

It seems to me that when you are storing objects, they won't be copied over wether you use an array or a Set (I assume the same is true for object/Map), the engine will only store a reference (either in its hash table, or straight in the memory). In both cases, the engine pre-allocates some space, and when it's not enough, has to find another memory address where it can fit, and has to copy data over. In addition, for Map and Set, it has to recalculate the hashes.

TL;DR, Set is not more memory nor performant efficient when it comes it changing data size, it's more efficient to ensure uniqueness and Set.prototype.has is much more efficient than Array.prototype.includes, otherwise you're better off with an array.

How did you come to that conclusion? Is there a way to check if this is true?

A couple years ago I did a benchmark hammering each with adds & deletes, and the results were like 50:1 Map vs plain object and Map vs Array, and like 20–10:1 Set vs Array (presumably Set was less good than Map because of the overhead enforcing uniqueness?); the performance gap widened non-linearly as size increased (mutations to the Map remained about the same cost, but mutations to Array became more and more expensive as the content within the Array got bigger).

I don't imagine that since then Map and Set got less performant or for Arrays to have changed much.

TL;DR, Set is not more memory nor performant efficient when it comes it changing data

I can't find the article anymore; it was written by one of the V8 people (or maybe Jake Archibald?) and specifically talked about how Map and Set are stored non-sequentially. Based on your investigation, it sounds like Map and Set internal hashtable (that remember meta information) is stored sequentially (which makes sense), but that the items are not necessarily (because the hashtable contains references/pointers, which can point anywhere). Moving a hashtable would be significantly less costly than moving a huge array with huge data.

But I think we can ignore Array vs Set for this PR because we want direct access, so it would be Array vs Map.

lib/internal/modules/esm/hooks.js

JakobJingleheimer · 2023-08-15T19:13:36Z

lib/internal/modules/esm/hooks.js

-    return initialize?.(data);
+
+    ArrayPrototypePush(this.#loaderInstances, instance);
+    return initialize?.(data, { __proto__: null, id: instance.id });


Suggested change

return initialize?.(data, { __proto__: null, id: instance.id });

return initialize?.(data, { __proto__: null, id: registrationId });

izaakschroeder · 2023-08-16T05:39:31Z

IMO register() should always return the ID, that eliminates the problem we had that there was no way to specify a transferList.

Assuming it’s currently sync and has no return value, I agree with @joyeecheung that we should change its return value to be the ID, similar to setTimeout in browsers. But we need to ensure that register remains sync, so that it stays easy to call from CommonJS.

@aduh95 how does this obviate the need for a transfer list? How does the initialize hook now return data to the caller of register(...)?

@GeoffreyBooth How does the initialize hook now return data to the caller of register(...) if register(...) now returns the loader id?

izaakschroeder · 2023-08-16T05:42:27Z

We need to store the order of the loaders, a Map doesn't seem adequate. Also, it's the first time I hear that Map would be more memory efficient than arrays, I'd be very surprised if that was the case.

I think we should not use a simple number like this for id. Instead, we should use a Map and some string-friendly id (because maps are significantly more efficient with string-based indexes than numeric that gets cast back and forth). It also makes mutations far more efficient, as well as being inherently better on memory (because an array must be pre-allocated space, whereas a Map takes up only exactly the amount of space it needs).

@aduh95 @GeoffreyBooth @ljharb do you want me to run some benchmarks to actually test the performance of the different approaches? I expect the frequency with which register and de/unregister is called is not at the level where any of this particularly matters?

izaakschroeder · 2023-08-16T06:03:55Z

I think we should use symbols for the public facing API, and the index for the internal API; using random string or number would be a mistake.

@aduh95 Symbol is not useable with structuredClone which means it cannot be used with worker threads and thus loader internals unless we do weird things like keep a symbol map specifically in the main thread just for this. Do you have an alternative recommendation?

GeoffreyBooth · 2023-08-16T06:11:49Z

do you want me to run some benchmarks to actually test the performance of the different approaches? I expect the frequency with which register and de/unregister is called is not at the level where any of this particularly matters?

No. Benchmarks aren't necessary but in general we always try to write performant code. I can imagine a worst case scenario where someone implements a bundler around this API, and registers and deregisters leaders over and over again for each imported module (because they're a poor programmer, probably, but still). So I think it's reasonable to lazy load things on first use, and to use a data structure where we can look up the loader by ID directly (like a map) rather than iterating through each element of an array.

For the ID, I was imagining something like the resolved absolute URL of the module plus an integer like 1, 2, 3 to cover the case of the same loader being registered multiple times. So something like file:///path/to/loader.js?1 say.

ljharb · 2023-08-16T06:21:11Z

It's a good point that using the URL of the resolved module, plus a counter, would be a reliable choice for a Map (or null object) key - we'd just need to make sure there was a fast lookup from URL to "last used counter" for new registrations.

izaakschroeder · 2023-08-16T06:24:24Z

It's a good point that using the URL of the resolved module, plus a counter, would be a reliable choice for a Map (or null object) key - we'd just need to make sure there was a fast lookup from URL to "last used counter" for new registrations.

@ljharb Why not just a counter? Is there a need to correlate URL with individual counts of loaders? It might be an interesting metric but a single monotonic number would suffice for an ID as far as I can tell…?

GeoffreyBooth · 2023-08-16T16:55:36Z

@GeoffreyBooth How does the initialize hook now return data to the caller of register(...) if register(...) now returns the loader id?

This is what I was getting at with my question about how the docs don’t currently specify register‘s signature. I think the suggestion that register returned the ID was probably made with the assumption that register currently returns nothing; but I guess it returns whatever initialize returns, right? So your original design is probably best, where we pass the ID into initialize and it’s on the user to return it from initialize if they want to eventually call deregister. If a loader doesn’t define initialize, we should perhaps have a defaultInitialize that just returns the ID. We should clarify all of this in the docs for both register and initialize.

Alternatively register could always return the ID in addition to whatever gets returned by initialize, either by merging objects or via something like const { id, initializationResult } = register(...), but that feels unnecessary to me. I don’t see deregister being used all that often, so we shouldn’t optimize the API for its usage.

aduh95 · 2023-08-16T18:56:04Z

how does this obviate the need for a transfer list? How does the initialize hook now return data to the caller of register(...)?

That's the idea: it would no longer return data to the caller – hence, no need for a transferList anymore.

izaakschroeder · 2023-08-16T19:02:21Z

That's the idea: it would no longer return data to the caller – hence, no need for a transferList anymore.

Doesn't this… break one of the feature requests initialize was originally requested for? To be able to pass a MessagePort around to communicate between userland and loaderland? Very specifically: nodejs/loaders#147 (comment). And if so, then are we coming up with another approach? Or are we just telling them they can't do this? I'm pretty sure this was a relatively important requirement for some library authors, like those building mocking systems.

aduh95 · 2023-08-16T19:14:10Z

You can still pass a MessagePort and get the data from there, how would that break this? It's already impossible to pass a MessagePort via the return value (because no transferList), so I don't think it would break anything.

GeoffreyBooth · 2023-08-16T19:18:01Z

You can still pass a MessagePort and get the data from there, how would that break this? It’s already impossible to pass a MessagePort via the return value (because no transferList), so I don’t think it would break anything.

Is the idea that the MessagePort is defined on the main thread and passed in for initialize to use? And then initialize can pass anything back to the main thread via the MessagePort.

We should get this clearly documented. If it’s confusing even us…

aduh95 · 2023-08-16T19:22:52Z

We should get this clearly documented. If it’s confusing even us…

I think it's already documented:

node/doc/api/esm.md

Lines 780 to 815 in ee8b7f1

    
           Loader code: 
        
           ```js 
        
           // In the below example this file is referenced as 
        
           // '/path-to-my-loader.js' 
        
           export async function initialize({ number, port }) { 
        
             port.postMessage(`increment: ${number + 1}`); 
        
             return 'ok'; 
        
           } 
        
           ``` 
        
           Caller code: 
        
           ```js 
        
           import assert from 'node:assert'; 
        
           import { register } from 'node:module'; 
        
           import { MessageChannel } from 'node:worker_threads'; 
        
           // This example showcases how a message channel can be used to 
        
           // communicate between the main (application) thread and the loader 
        
           // running on the loaders thread, by sending `port2` to the loader. 
        
           const { port1, port2 } = new MessageChannel(); 
        
           port1.on('message', (msg) => { 
        
             assert.strictEqual(msg, 'increment: 2'); 
        
           }); 
        
           const result = register('/path-to-my-loader.js', { 
        
             parentURL: import.meta.url, 
        
             data: { number: 1, port: port2 }, 
        
             transferList: [port2], 
        
           }); 
        
           assert.strictEqual(result, 'ok'); 
        
           ```

@GeoffreyBooth

Suggestion from @GeoffreyBooth. Adds `deregister` method on `node:module` that looks like this: ```ts type Deregister = (id: string) => boolean; ``` Modifies the initialize hook to look like this: ```ts type Initialize = (data: any, meta: {id: opaque}) => Promise<any>; ``` Internally registered instances of hooks are now tracked. This is so they can be removed later. The id of the registered instance is now passed to the `initialize` hook. The return value of register is now the loader id. This is mapped to a symbol outside of the loader thread to prevent abuse. ```js // Caller import {register, deregister} from "node:module"; const id = register(...); // ... deregister(id); ```

nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. labels Aug 14, 2023

ljharb requested changes Aug 14, 2023

View reviewed changes

aduh95 reviewed Aug 14, 2023

View reviewed changes

test/es-module/test-esm-loader-hooks.mjs Outdated Show resolved Hide resolved

aduh95 reviewed Aug 14, 2023

View reviewed changes

test/es-module/test-esm-loader-hooks.mjs Outdated Show resolved Hide resolved

anonrig reviewed Aug 14, 2023

View reviewed changes

JakobJingleheimer reviewed Aug 15, 2023

View reviewed changes

izaakschroeder force-pushed the deregister branch from 2c2bb6c to 84be528 Compare August 23, 2023 01:00

izaakschroeder force-pushed the deregister branch from 84be528 to 2774269 Compare August 23, 2023 04:24

izaakschroeder mentioned this pull request Aug 24, 2023

esm: refactor hooks – add loader instances #49315

Open

GeoffreyBooth mentioned this pull request Oct 4, 2023

module: initialize hook returns load, resolve #50044

Open

2 tasks

GeoffreyBooth mentioned this pull request Jun 22, 2024

process: port on-exit-leak-free to core #53239

Merged

4 tasks

	this.#chains.globalPreload.splice(index, 1);
	ArrayPrototypeSplice(this.#chains.globalPreload, index, 1);

	this.#loaderInstances.splice(index, 1);
	ArrayPrototypeSplice(this.#loaderInstances, index, 1);

	#lastInstanceId = 0;
	#loaderInstances = [];
	#customLoaders = new SafeMap();

	// We do not have enough space, grow the table and reload the relevant
	// fields.
	CallRuntime(Runtime::kSetGrow, context, receiver);

	RUNTIME_FUNCTION(Runtime_SetGrow) {
	HandleScope scope(isolate);
	DCHECK_EQ(1, args.length());
	Handle<JSSet> holder = args.at<JSSet>(0);
	Handle<OrderedHashSet> table(OrderedHashSet::cast(holder->table()), isolate);
	MaybeHandle<OrderedHashSet> table_candidate =
	OrderedHashSet::EnsureGrowable(isolate, table);

	template <class Derived, int entrysize>
	MaybeHandle<Derived> OrderedHashTable<Derived, entrysize>::EnsureGrowable(
	Isolate* isolate, Handle<Derived> table) {
	DCHECK(!table->IsObsolete());

	int nof = table->NumberOfElements();
	int nod = table->NumberOfDeletedElements();
	int capacity = table->Capacity();
	if ((nof + nod) < capacity) return table;

	int new_capacity;
	if (capacity == 0) {
	// step from empty to minimum proper size
	new_capacity = kInitialCapacity;
	} else if (nod >= (capacity >> 1)) {
	// Don't need to grow if we can simply clear out deleted entries instead.
	// Note that we can't compact in place, though, so we always allocate
	// a new table.
	new_capacity = capacity;
	} else {
	new_capacity = capacity << 1;
	}

	return Derived::Rehash(isolate, table, new_capacity);
	}

	template <class Derived, int entrysize>
	MaybeHandle<Derived> OrderedHashTable<Derived, entrysize>::Rehash(
	Isolate* isolate, Handle<Derived> table, int new_capacity) {

	for (InternalIndex old_entry : table->IterateEntries()) {
	int old_entry_raw = old_entry.as_int();
	Object key = table->KeyAt(old_entry);
	if (key.IsTheHole(isolate)) {
	table->SetRemovedIndexAt(removed_holes_index++, old_entry_raw);
	continue;
	}

	Object hash = key.GetHash();
	int bucket = Smi::ToInt(hash) & (new_buckets - 1);
	Object chain_entry = new_table->get(HashTableStartIndex() + bucket);
	new_table->set(HashTableStartIndex() + bucket, Smi::FromInt(new_entry));
	int new_index = new_table->EntryToIndexRaw(new_entry);
	int old_index = table->EntryToIndexRaw(old_entry_raw);
	for (int i = 0; i < entrysize; ++i) {
	Object value = table->get(old_index + i);
	new_table->set(new_index + i, value);
	}
	new_table->set(new_index + kChainOffset, chain_entry);
	++new_entry;
	}

	BIND(&fast);
	{
	array_receiver = CAST(receiver);
	arg_index = IntPtrConstant(0);
	kind = EnsureArrayPushable(context, LoadMap(array_receiver), &runtime);
	GotoIf(IsElementsKindGreaterThan(kind, HOLEY_SMI_ELEMENTS),
	&object_push_pre);

	TNode<Smi> new_length =
	BuildAppendJSArray(PACKED_SMI_ELEMENTS, array_receiver, &args,
	&arg_index, &smi_transition);
	args.PopAndReturn(new_length);
	}

	BIND(&object_push);
	{
	TNode<Smi> new_length = BuildAppendJSArray(
	PACKED_ELEMENTS, array_receiver, &args, &arg_index, &default_label);
	args.PopAndReturn(new_length);
	}

	TNode<Smi> CodeStubAssembler::BuildAppendJSArray(ElementsKind kind,
	TNode<JSArray> array,
	CodeStubArguments* args,
	TVariable<IntPtrT>* arg_index,
	Label* bailout) {

	// Resize the capacity of the fixed array if it doesn't fit.
	TNode<IntPtrT> first = arg_index->value();
	TNode<BInt> growth =
	IntPtrToBInt(IntPtrSub(args->GetLengthWithoutReceiver(), first));
	PossiblyGrowElementsCapacity(kind, array, var_length.value(), &var_elements,
	growth, &pre_bailout);

	void CodeStubAssembler::PossiblyGrowElementsCapacity(
	ElementsKind kind, TNode<HeapObject> array, TNode<BInt> length,
	TVariable<FixedArrayBase>* var_elements, TNode<BInt> growth,
	Label* bailout) {
	Label fits(this, var_elements);
	TNode<BInt> capacity =
	TaggedToParameter<BInt>(LoadFixedArrayBaseLength(var_elements->value()));

	TNode<BInt> new_length = IntPtrOrSmiAdd(growth, length);
	GotoIfNot(IntPtrOrSmiGreaterThan(new_length, capacity), &fits);
	TNode<BInt> new_capacity = CalculateNewElementsCapacity(new_length);
	*var_elements = GrowElementsCapacity(array, var_elements->value(), kind, kind,
	capacity, new_capacity, bailout);
	Goto(&fits);
	BIND(&fits);
	}

	template <typename TIndex>
	TNode<FixedArrayBase> CodeStubAssembler::GrowElementsCapacity(
	TNode<HeapObject> object, TNode<FixedArrayBase> elements,
	ElementsKind from_kind, ElementsKind to_kind, TNode<TIndex> capacity,
	TNode<TIndex> new_capacity, Label* bailout) {

	// Allocate the new backing store.
	TNode<FixedArrayBase> new_elements =
	AllocateFixedArray(to_kind, new_capacity);

	// Copy the elements from the old elements store to the new.
	// The size-check above guarantees that the \|new_elements\| is allocated
	// in new space so we can skip the write barrier.
	CopyFixedArrayElements(from_kind, elements, to_kind, new_elements, capacity,
	new_capacity, SKIP_WRITE_BARRIER);

	template <typename TIndex>
	void CodeStubAssembler::CopyFixedArrayElements(
	ElementsKind from_kind, TNode<FixedArrayBase> from_array,
	ElementsKind to_kind, TNode<FixedArrayBase> to_array,
	TNode<TIndex> first_element, TNode<TIndex> element_count,
	TNode<TIndex> capacity, WriteBarrierMode barrier_mode,
	HoleConversionMode convert_holes, TVariable<BoolT>* var_holes_converted) {

	return initialize?.(data, { __proto__: null, id: instance.id });
	return initialize?.(data, { __proto__: null, id: registrationId });

Uh oh!

esm: add deregister method #49159

Are you sure you want to change the base?

esm: add deregister method #49159

Uh oh!

Conversation

izaakschroeder commented Aug 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TBD

Uh oh!

nodejs-github-bot commented Aug 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aduh95 commented Aug 14, 2023

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joyeecheung commented Aug 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GeoffreyBooth commented Aug 15, 2023

Uh oh!

GeoffreyBooth commented Aug 15, 2023

Uh oh!

ljharb commented Aug 15, 2023

Uh oh!

JakobJingleheimer commented Aug 15, 2023

Uh oh!

ljharb commented Aug 15, 2023

Uh oh!

GeoffreyBooth commented Aug 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ljharb commented Aug 15, 2023

Uh oh!

joyeecheung commented Aug 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JakobJingleheimer Aug 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aduh95 Aug 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JakobJingleheimer Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JakobJingleheimer Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

esm: add `deregister` method #49159

esm: add `deregister` method #49159

izaakschroeder commented Aug 14, 2023 •

edited

Loading

joyeecheung commented Aug 15, 2023 •

edited

Loading

GeoffreyBooth commented Aug 15, 2023 •

edited

Loading

joyeecheung commented Aug 15, 2023 •

edited

Loading

JakobJingleheimer Aug 15, 2023 •

edited

Loading

aduh95 Aug 15, 2023 •

edited

Loading

JakobJingleheimer Aug 16, 2023 •

edited

Loading

JakobJingleheimer Aug 16, 2023 •

edited

Loading

GeoffreyBooth commented Aug 16, 2023 •

edited

Loading

izaakschroeder commented Aug 16, 2023 •

edited

Loading

izaakschroeder commented Aug 16, 2023 •

edited

Loading