Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: add deregister method #49159

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

izaakschroeder
Copy link
Contributor

@izaakschroeder izaakschroeder commented Aug 14, 2023

TBD

  • refactor hooks internals – remove chains in favor of loaderInstances; there is a tiny performance penalty when going through any given chain (since there's a conditional branch now and possibly extra iterations up to the total number of loaders) but i think this is negligible compared to the usability increase – there is now a default loader called default_loader that is registered like any other as node's internal one and only one array instead of several. the logic for adding/removing loaders is significantly simpler now.
  • change the return type of register – previously it returned whatever initialize did. @aduh95 noted that we should return a "loader handle" or a symbol to the loader instead of the initialize result which is what i've done here. i managed to make returning a symbol work across the thread gap. i am not 100% sold on symbol vs some kind of loader object which was also suggested (a thing that had a deregister method); i am still concerned about loader authors being able to pass initialization data back if it's somehow useful to them. N.B. this seems kind of like an 11th hour change now that 20.6.0 is imminent and the loaders API is being marked somewhat stable.
  • allow hook methods to receive a fourth state argument – so instead of returning the result of initialize and discard it, we now keep the result attached to the loader instance. this value is then fed back to hook invocations. after thinking about the comments in Replace globalPreload loaders#147 this actually seems necessary if there can be an arbitrary number of "copies" of any given loader; how can loaders differentiate their internal state from one another? there isn't a good way without something like this.
  • add deregister – this builds on the first two changes; the first one makes it sensible/trivial to remove existing loaders, and the second makes it possible for the user to actually be able to reference an individual loader which is necessary for removal

Suggestion from @GeoffreyBooth.

Adds deregister method on node:module that looks like this:

type Deregister = (id: string) => boolean;

Modifies the initialize hook to look like this:

type Initialize = (data: any, meta: {id: opaque}) => Promise<any>;

Internally registered instances of hooks are now tracked. This is so they can be removed later. The id of the registered instance is now passed to the initialize hook which can then be passed back to the caller of register.

// Loader
export const initialize = (_data, meta) => {
  return meta.id;
}
// Caller
import {register, deregister} from "node:module";

const id = register(...);

// ...

deregister(id);

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders
  • @nodejs/modules

@nodejs-github-bot nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. labels Aug 14, 2023
#removeFromChain(chain, target) {
for (let i = 0; i < chain.length; ++i) {
if (target === chain[i+1]?.fn) {
chain.splice(i+1, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
chain.splice(i+1, 1);
ArrayPrototypeSplice(chain, i + 1, 1);

return x.fn === instance.globalPreload;
});
if (index >= 0) {
this.#chains.globalPreload.splice(index, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.#chains.globalPreload.splice(index, 1);
ArrayPrototypeSplice(this.#chains.globalPreload, index, 1);

if (instance.load) {
this.#removeFromChain(this.#chains.load, instance.load);
}
this.#loaderInstances.splice(index, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.#loaderInstances.splice(index, 1);
ArrayPrototypeSplice(this.#loaderInstances, index, 1);

@aduh95
Copy link
Contributor

aduh95 commented Aug 14, 2023

We would need some docs.

Modifies the initialize hook to look like this:

type Initialize = (data: any, meta: {id: opaque}) => Promise<any>;

IMO register() should always return the ID, that eliminates the problem we had that there was no way to specify a transferList.

Comment on lines 211 to 213
const index = ArrayPrototypeFindIndex(this.#loaderInstances, (target) => {
return target.id === id;
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a different data structure that doesn't require us to traverse through the whole array to find what we are looking for?

if (index < 0) {
return false;
}
const instance = this.#loaderInstances[index];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using index to just get the element, we can use ArrayPrototypeFind instead of index.

const index = ArrayPrototypeFindIndex(this.#chains.globalPreload, (x) => {
return x.fn === instance.globalPreload;
});
if (index >= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (index >= 0) {
if (index !== -1) {

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

lib/internal/modules/esm/loader.js Show resolved Hide resolved
@joyeecheung
Copy link
Member

joyeecheung commented Aug 15, 2023

While deregister might be a correct word, I can't seem to find any use of it in the code base, there are many references to unregister in the code base though, so switching to unregister would make it more consistent.

@GeoffreyBooth
Copy link
Member

While deregister might be a correct word, I can’t seem to find any use of it in the code base, there are many references to unregister in the code base though, so switching to unregister would make it more consistent.

https://english.stackexchange.com/a/40095/250362

TLDR: “deregister” is the verb, “unregistered” is the state once it’s been deregistered. So “deregister” is correct in this case, because the API is the action being taken.

@GeoffreyBooth
Copy link
Member

IMO register() should always return the ID, that eliminates the problem we had that there was no way to specify a transferList.

What is the current signature for register? I don’t see it in https://github.com/nodejs/node/blob/main/doc/api/module.md#moduleregister. We should add that (either in its own PR or I would be fine with it getting added in this one.)

Assuming it’s currently sync and has no return value, I agree with @joyeecheung that we should change its return value to be the ID, similar to setTimeout in browsers. But we need to ensure that register remains sync, so that it stays easy to call from CommonJS.

@ljharb
Copy link
Member

ljharb commented Aug 15, 2023

https://english.stackexchange.com/questions/25931/unregister-vs-deregister/40095#comment1140785_40095 states that in the verb form, both prefixes are synonyms.

@JakobJingleheimer
Copy link
Contributor

I think deregister and unregister are similar enough that if there's already precedent, we should go with precedent. Otherwise, I believe deregister is more correct: there are more forms of the word based on de- (ex deregistration not unregistration). Also, very nuanced: un- generally has the sense of "revert" (like undo), whereas de- generally has the sense of "remove" and "cease" (like decelerate, deflate, desist).

@ljharb
Copy link
Member

ljharb commented Aug 15, 2023

Right, and in this case it's reverting/undoing a registration :-)

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Aug 15, 2023

if there’s already precedent, we should go with precedent

As far as I can tell the only precedent is internal. We have no public-facing APIs that are called “unregister,” per https://www.google.com/search?q=site%3Anodejs.org+%22unregister%22. I feel pretty strongly that “deregister” is correct English, so the public API should have that name, and we can clean up/rename the internal APIs whenever we get a chance. If you read the full page at https://english.stackexchange.com/a/40095 it seems pretty clear that most people consider “deregister” more correct, with some arguing that they’re equivalent, but no one thinks that “unregister” is more correct English.

This is a bikeshed and wasting attention. We already have a name. If you want to discuss it further please open a separate issue.

@ljharb
Copy link
Member

ljharb commented Aug 15, 2023

@joyeecheung
Copy link
Member

joyeecheung commented Aug 15, 2023

We have no public-facing APIs that are called “unregister,”

UnregisterIsolate() is part of the public embedder/addon API in node.h though (e.g. if we rename it, we break electron), and it's used in a few places in the API docs. (and as @ljharb mentioned, it's also present in some of the Web API/JS APIs, which you can also see being used in lib/ and test/ ...)

@@ -160,6 +168,15 @@ class Hooks {
load,
} = pluckHooks(exports);

const instance = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is maybe not a good name: instance usually means instance of a class, but here I think it's being used as "occurrence".

Suggested change
const instance = {
const registrationId = //

Copy link
Contributor

@JakobJingleheimer JakobJingleheimer Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use something like uid(3) (it looks like crypto.randomUUID() is the only thing already available in node, but that would be massively overkill) or perhaps the loader's url. Chatting with Geoffrey, he pointed out a loader can now be registered multiple times, so if the loader's url, it would need to have something else appended, like a generation id.

Copy link
Contributor

@aduh95 aduh95 Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use symbols for the public facing API, and the index for the internal API; using random string or number would be a mistake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooow! That's a good idea!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the value of a symbol as opposed to a string like file:///path/to/loader.js?1? At least the latter provides some useful information, such as for debugging, and then we don’t need to have different data structures internally and externally.

Copy link
Contributor

@JakobJingleheimer JakobJingleheimer Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A symbol is guaranteed to be unique, so then there is no concern for a generational id on the same url:

Symbol(url) ≠ Symbol(url)

Also, the label of a symbol is visible, so it still provides almost identical debugging value 🙂

And bonus: it precludes the possibility of a user getting overly clever and foot-gunning a deregistration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also return a handle, like:

const handle = register(...)
handle.deregister()

Then the handle could potentially get other methods in the future. It could also contain the return value of initialize as a property, like handle.initializationResult.

Not sure I prefer this over other options, just adding it to the mix. We still need to deal with the problem that register currently returns the return value of initialize, so either we add this symbol into initialize for the user to return, and then somehow we get symbols across the thread barrier; or we change the return signature of register.

Comment on lines 125 to 89
#lastInstanceId = 0;
#loaderInstances = [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not use a simple number like this for id. Instead, we should use a Map and some string-friendly id (because maps are significantly more efficient with string-based indexes than numeric that gets cast back and forth). It also makes mutations far more efficient, as well as being inherently better on memory (because an array must be pre-allocated space, whereas a Map takes up only exactly the amount of space it needs).

Suggested change
#lastInstanceId = 0;
#loaderInstances = [];
#customLoaders = new SafeMap();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to store the order of the loaders, a Map doesn't seem adequate. Also, it's the first time I hear that Map would be more memory efficient than arrays, I'd be very surprised if that was the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maps are ordered, but I agree

Copy link
Contributor

@JakobJingleheimer JakobJingleheimer Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the sequence of Map items is guaranteed (insertion order): MDN: Map

The Map object holds key-value pairs and remembers the original insertion order of the keys.


Also, it's the first time I hear that Map would be more memory efficient than arrays, I'd be very surprised if that was the case.

Maps are better than Arrays in terms of both space and time efficiency:

Arrays are pre-sized. V8 pre-allocates a rather significant size¹ for an array in order to avoid having to move it (because the entire array is stored sequentially in memory) which is a very expensive operation.

Maps are not stored sequentially in memory, so there is no "move" operation when they grow (nor is there a pre-allocation of space).

¹ unless a size is specifically provided like new Array(2) (then the array will be exactly sized); however this can be a de-op due to data type of the items themselves (under the hood, V8 can use a better-suited type of array, but it doesn't do that optimisation when you strong-arm it, so unless the items in the array are all the same data-type, generally a primitive, it's better to not force a size).

Sets are similarly better for space and time efficiency (and are also not stored sequentially in memory).

Also worth noting: Maps and Sets are significantly better for rapid mutation than Arrays, and that's exactly what we expect to happen here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth noting: Maps and Sets are significantly better for rapid mutation than Arrays, and that's exactly what we expect to happen here.

I don't think we expect that, I would expect that unregister to not be useful to most applications – I've never seen anyone request it, but maybe there are some use cases I'm not seeing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 sides to this coin ;) deletions from de/un-register, but primarily additions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sets are similarly better for space and time efficiency (and are also not stored sequentially in memory).

Also worth noting: Maps and Sets are significantly better for rapid mutation than Arrays, and that's exactly what we expect to happen here.

How did you come to that conclusion? Is there a way to check if this is true?

From my understanding, and with some pointers from @targos, I don't see how this could be true.

In V8, Set and Map instances have a hash table associated with them which seems to be stored sequentially in memory:

TF_BUILTIN(SetPrototypeAdd, CollectionsBuiltinsAssembler) {

// We do not have enough space, grow the table and reload the relevant
// fields.
CallRuntime(Runtime::kSetGrow, context, receiver);

RUNTIME_FUNCTION(Runtime_SetGrow) {
HandleScope scope(isolate);
DCHECK_EQ(1, args.length());
Handle<JSSet> holder = args.at<JSSet>(0);
Handle<OrderedHashSet> table(OrderedHashSet::cast(holder->table()), isolate);
MaybeHandle<OrderedHashSet> table_candidate =
OrderedHashSet::EnsureGrowable(isolate, table);

template <class Derived, int entrysize>
MaybeHandle<Derived> OrderedHashTable<Derived, entrysize>::EnsureGrowable(
Isolate* isolate, Handle<Derived> table) {
DCHECK(!table->IsObsolete());
int nof = table->NumberOfElements();
int nod = table->NumberOfDeletedElements();
int capacity = table->Capacity();
if ((nof + nod) < capacity) return table;
int new_capacity;
if (capacity == 0) {
// step from empty to minimum proper size
new_capacity = kInitialCapacity;
} else if (nod >= (capacity >> 1)) {
// Don't need to grow if we can simply clear out deleted entries instead.
// Note that we can't compact in place, though, so we always allocate
// a new table.
new_capacity = capacity;
} else {
new_capacity = capacity << 1;
}
return Derived::Rehash(isolate, table, new_capacity);
}

From the above, I understand the Set or Map hash table capacity is doubled everytime it reaches its capacity, so not so memory efficient. Now that Rehash function has to recompute the hash for each elements for the new table:

template <class Derived, int entrysize>
MaybeHandle<Derived> OrderedHashTable<Derived, entrysize>::Rehash(
Isolate* isolate, Handle<Derived> table, int new_capacity) {

for (InternalIndex old_entry : table->IterateEntries()) {
int old_entry_raw = old_entry.as_int();
Object key = table->KeyAt(old_entry);
if (key.IsTheHole(isolate)) {
table->SetRemovedIndexAt(removed_holes_index++, old_entry_raw);
continue;
}
Object hash = key.GetHash();
int bucket = Smi::ToInt(hash) & (new_buckets - 1);
Object chain_entry = new_table->get(HashTableStartIndex() + bucket);
new_table->set(HashTableStartIndex() + bucket, Smi::FromInt(new_entry));
int new_index = new_table->EntryToIndexRaw(new_entry);
int old_index = table->EntryToIndexRaw(old_entry_raw);
for (int i = 0; i < entrysize; ++i) {
Object value = table->get(old_index + i);
new_table->set(new_index + i, value);
}
new_table->set(new_index + kChainOffset, chain_entry);
++new_entry;
}

In comparison, %Array.prototype.push% has a fast path, and seems also simpler (because no hashing involved), and also has optimizations based on what type of object you add to the array. The object path:

TF_BUILTIN(ArrayPrototypePush, CodeStubAssembler) {

BIND(&fast);
{
array_receiver = CAST(receiver);
arg_index = IntPtrConstant(0);
kind = EnsureArrayPushable(context, LoadMap(array_receiver), &runtime);
GotoIf(IsElementsKindGreaterThan(kind, HOLEY_SMI_ELEMENTS),
&object_push_pre);
TNode<Smi> new_length =
BuildAppendJSArray(PACKED_SMI_ELEMENTS, array_receiver, &args,
&arg_index, &smi_transition);
args.PopAndReturn(new_length);
}

BIND(&object_push);
{
TNode<Smi> new_length = BuildAppendJSArray(
PACKED_ELEMENTS, array_receiver, &args, &arg_index, &default_label);
args.PopAndReturn(new_length);
}

TNode<Smi> CodeStubAssembler::BuildAppendJSArray(ElementsKind kind,
TNode<JSArray> array,
CodeStubArguments* args,
TVariable<IntPtrT>* arg_index,
Label* bailout) {

// Resize the capacity of the fixed array if it doesn't fit.
TNode<IntPtrT> first = arg_index->value();
TNode<BInt> growth =
IntPtrToBInt(IntPtrSub(args->GetLengthWithoutReceiver(), first));
PossiblyGrowElementsCapacity(kind, array, var_length.value(), &var_elements,
growth, &pre_bailout);

void CodeStubAssembler::PossiblyGrowElementsCapacity(
ElementsKind kind, TNode<HeapObject> array, TNode<BInt> length,
TVariable<FixedArrayBase>* var_elements, TNode<BInt> growth,
Label* bailout) {
Label fits(this, var_elements);
TNode<BInt> capacity =
TaggedToParameter<BInt>(LoadFixedArrayBaseLength(var_elements->value()));
TNode<BInt> new_length = IntPtrOrSmiAdd(growth, length);
GotoIfNot(IntPtrOrSmiGreaterThan(new_length, capacity), &fits);
TNode<BInt> new_capacity = CalculateNewElementsCapacity(new_length);
*var_elements = GrowElementsCapacity(array, var_elements->value(), kind, kind,
capacity, new_capacity, bailout);
Goto(&fits);
BIND(&fits);
}

template <typename TIndex>
TNode<FixedArrayBase> CodeStubAssembler::GrowElementsCapacity(
TNode<HeapObject> object, TNode<FixedArrayBase> elements,
ElementsKind from_kind, ElementsKind to_kind, TNode<TIndex> capacity,
TNode<TIndex> new_capacity, Label* bailout) {

// Allocate the new backing store.
TNode<FixedArrayBase> new_elements =
AllocateFixedArray(to_kind, new_capacity);
// Copy the elements from the old elements store to the new.
// The size-check above guarantees that the |new_elements| is allocated
// in new space so we can skip the write barrier.
CopyFixedArrayElements(from_kind, elements, to_kind, new_elements, capacity,
new_capacity, SKIP_WRITE_BARRIER);

So if the preallocated space is not enough, the algorithm to copy the array elements is a bit tedious, so I'll stop there:

template <typename TIndex>
void CodeStubAssembler::CopyFixedArrayElements(
ElementsKind from_kind, TNode<FixedArrayBase> from_array,
ElementsKind to_kind, TNode<FixedArrayBase> to_array,
TNode<TIndex> first_element, TNode<TIndex> element_count,
TNode<TIndex> capacity, WriteBarrierMode barrier_mode,
HoleConversionMode convert_holes, TVariable<BoolT>* var_holes_converted) {

It seems to me that when you are storing objects, they won't be copied over wether you use an array or a Set (I assume the same is true for object/Map), the engine will only store a reference (either in its hash table, or straight in the memory). In both cases, the engine pre-allocates some space, and when it's not enough, has to find another memory address where it can fit, and has to copy data over. In addition, for Map and Set, it has to recalculate the hashes.

TL;DR, Set is not more memory nor performant efficient when it comes it changing data size, it's more efficient to ensure uniqueness and Set.prototype.has is much more efficient than Array.prototype.includes, otherwise you're better off with an array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you come to that conclusion? Is there a way to check if this is true?

A couple years ago I did a benchmark hammering each with adds & deletes, and the results were like 50:1 Map vs plain object and Map vs Array, and like 20–10:1 Set vs Array (presumably Set was less good than Map because of the overhead enforcing uniqueness?); the performance gap widened non-linearly as size increased (mutations to the Map remained about the same cost, but mutations to Array became more and more expensive as the content within the Array got bigger).

I don't imagine that since then Map and Set got less performant or for Arrays to have changed much.

TL;DR, Set is not more memory nor performant efficient when it comes it changing data

I can't find the article anymore; it was written by one of the V8 people (or maybe Jake Archibald?) and specifically talked about how Map and Set are stored non-sequentially. Based on your investigation, it sounds like Map and Set internal hashtable (that remember meta information) is stored sequentially (which makes sense), but that the items are not necessarily (because the hashtable contains references/pointers, which can point anywhere). Moving a hashtable would be significantly less costly than moving a huge array with huge data.

But I think we can ignore Array vs Set for this PR because we want direct access, so it would be Array vs Map.

lib/internal/modules/esm/hooks.js Show resolved Hide resolved
return initialize?.(data);

ArrayPrototypePush(this.#loaderInstances, instance);
return initialize?.(data, { __proto__: null, id: instance.id });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return initialize?.(data, { __proto__: null, id: instance.id });
return initialize?.(data, { __proto__: null, id: registrationId });

@izaakschroeder
Copy link
Contributor Author

IMO register() should always return the ID, that eliminates the problem we had that there was no way to specify a transferList.

Assuming it’s currently sync and has no return value, I agree with @joyeecheung that we should change its return value to be the ID, similar to setTimeout in browsers. But we need to ensure that register remains sync, so that it stays easy to call from CommonJS.

@aduh95 how does this obviate the need for a transfer list? How does the initialize hook now return data to the caller of register(...)?

@GeoffreyBooth How does the initialize hook now return data to the caller of register(...) if register(...) now returns the loader id?

@izaakschroeder
Copy link
Contributor Author

We need to store the order of the loaders, a Map doesn't seem adequate. Also, it's the first time I hear that Map would be more memory efficient than arrays, I'd be very surprised if that was the case.

I think we should not use a simple number like this for id. Instead, we should use a Map and some string-friendly id (because maps are significantly more efficient with string-based indexes than numeric that gets cast back and forth). It also makes mutations far more efficient, as well as being inherently better on memory (because an array must be pre-allocated space, whereas a Map takes up only exactly the amount of space it needs).

@aduh95 @GeoffreyBooth @ljharb do you want me to run some benchmarks to actually test the performance of the different approaches? I expect the frequency with which register and de/unregister is called is not at the level where any of this particularly matters?

@izaakschroeder
Copy link
Contributor Author

I think we should use symbols for the public facing API, and the index for the internal API; using random string or number would be a mistake.

@aduh95 Symbol is not useable with structuredClone which means it cannot be used with worker threads and thus loader internals unless we do weird things like keep a symbol map specifically in the main thread just for this. Do you have an alternative recommendation?

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Aug 16, 2023

do you want me to run some benchmarks to actually test the performance of the different approaches? I expect the frequency with which register and de/unregister is called is not at the level where any of this particularly matters?

No. Benchmarks aren't necessary but in general we always try to write performant code. I can imagine a worst case scenario where someone implements a bundler around this API, and registers and deregisters leaders over and over again for each imported module (because they're a poor programmer, probably, but still). So I think it's reasonable to lazy load things on first use, and to use a data structure where we can look up the loader by ID directly (like a map) rather than iterating through each element of an array.

For the ID, I was imagining something like the resolved absolute URL of the module plus an integer like 1, 2, 3 to cover the case of the same loader being registered multiple times. So something like file:///path/to/loader.js?1 say.

@ljharb
Copy link
Member

ljharb commented Aug 16, 2023

It's a good point that using the URL of the resolved module, plus a counter, would be a reliable choice for a Map (or null object) key - we'd just need to make sure there was a fast lookup from URL to "last used counter" for new registrations.

@izaakschroeder
Copy link
Contributor Author

izaakschroeder commented Aug 16, 2023

It's a good point that using the URL of the resolved module, plus a counter, would be a reliable choice for a Map (or null object) key - we'd just need to make sure there was a fast lookup from URL to "last used counter" for new registrations.

@ljharb Why not just a counter? Is there a need to correlate URL with individual counts of loaders? It might be an interesting metric but a single monotonic number would suffice for an ID as far as I can tell…?

@GeoffreyBooth
Copy link
Member

@GeoffreyBooth How does the initialize hook now return data to the caller of register(...) if register(...) now returns the loader id?

This is what I was getting at with my question about how the docs don’t currently specify register‘s signature. I think the suggestion that register returned the ID was probably made with the assumption that register currently returns nothing; but I guess it returns whatever initialize returns, right? So your original design is probably best, where we pass the ID into initialize and it’s on the user to return it from initialize if they want to eventually call deregister. If a loader doesn’t define initialize, we should perhaps have a defaultInitialize that just returns the ID. We should clarify all of this in the docs for both register and initialize.

Alternatively register could always return the ID in addition to whatever gets returned by initialize, either by merging objects or via something like const { id, initializationResult } = register(...), but that feels unnecessary to me. I don’t see deregister being used all that often, so we shouldn’t optimize the API for its usage.

@aduh95
Copy link
Contributor

aduh95 commented Aug 16, 2023

how does this obviate the need for a transfer list? How does the initialize hook now return data to the caller of register(...)?

That's the idea: it would no longer return data to the caller – hence, no need for a transferList anymore.

@izaakschroeder
Copy link
Contributor Author

izaakschroeder commented Aug 16, 2023

That's the idea: it would no longer return data to the caller – hence, no need for a transferList anymore.

Doesn't this… break one of the feature requests initialize was originally requested for? To be able to pass a MessagePort around to communicate between userland and loaderland? Very specifically: nodejs/loaders#147 (comment). And if so, then are we coming up with another approach? Or are we just telling them they can't do this? I'm pretty sure this was a relatively important requirement for some library authors, like those building mocking systems.

@aduh95
Copy link
Contributor

aduh95 commented Aug 16, 2023

You can still pass a MessagePort and get the data from there, how would that break this? It's already impossible to pass a MessagePort via the return value (because no transferList), so I don't think it would break anything.

@GeoffreyBooth
Copy link
Member

You can still pass a MessagePort and get the data from there, how would that break this? It’s already impossible to pass a MessagePort via the return value (because no transferList), so I don’t think it would break anything.

Is the idea that the MessagePort is defined on the main thread and passed in for initialize to use? And then initialize can pass anything back to the main thread via the MessagePort.

We should get this clearly documented. If it’s confusing even us…

@aduh95
Copy link
Contributor

aduh95 commented Aug 16, 2023

We should get this clearly documented. If it’s confusing even us…

I think it's already documented:

node/doc/api/esm.md

Lines 780 to 815 in ee8b7f1

Loader code:
```js
// In the below example this file is referenced as
// '/path-to-my-loader.js'
export async function initialize({ number, port }) {
port.postMessage(`increment: ${number + 1}`);
return 'ok';
}
```
Caller code:
```js
import assert from 'node:assert';
import { register } from 'node:module';
import { MessageChannel } from 'node:worker_threads';
// This example showcases how a message channel can be used to
// communicate between the main (application) thread and the loader
// running on the loaders thread, by sending `port2` to the loader.
const { port1, port2 } = new MessageChannel();
port1.on('message', (msg) => {
assert.strictEqual(msg, 'increment: 2');
});
const result = register('/path-to-my-loader.js', {
parentURL: import.meta.url,
data: { number: 1, port: port2 },
transferList: [port2],
});
assert.strictEqual(result, 'ok');
```

Suggestion from @GeoffreyBooth.

Adds `deregister` method on `node:module` that looks like this:

```ts
type Deregister = (id: string) => boolean;
```

Modifies the initialize hook to look like this:

```ts
type Initialize = (data: any, meta: {id: opaque}) => Promise<any>;
```

Internally registered instances of hooks are now tracked. This is so
they can be removed later. The id of the registered instance is now
passed to the `initialize` hook.

The return value of register is now the loader id. This is mapped to
a symbol outside of the loader thread to prevent abuse.

```js
// Caller
import {register, deregister} from "node:module";

const id = register(...);

// ...

deregister(id);
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants