Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: refactor hooks – add loader instances #49315

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

izaakschroeder
Copy link
Contributor

@izaakschroeder izaakschroeder commented Aug 24, 2023

Previously we had one array for each kind of chain in the variable #chains; this is probably the most efficient, but it decouples the connection between the original register call and the hook functions themselves.

By adding #loaderInstances we can keep track of each time register is called. This allows, for example, trivially removing the loader later on, or associating shared context with a particular register call. The tradeoff here if we just did this would be that during hook traversal we may iterate over a slightly larger number of objects. Instead we keep #chains but it is now a derived property; every time #loaderInstances changes we simply rebuild #chains.

This sets the stage for doing other things, like adding deregister: #49159 or adding the ability to attach local state to individual loader instances.

N.B. I should probably add back the documentation about the entries in #chains somehow.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders
  • @nodejs/modules

@nodejs-github-bot nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run. labels Aug 24, 2023
const { defaultResolve } = require('internal/modules/esm/resolve');

exports.resolve = defaultResolve;
exports.load = defaultLoad;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we create a new file we might need to add it to the startup snapshot (see #45849). Alternatively is there a way to stick this in an existing file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I put this in its own file is so that the url property actually led to a valid file that represented the base "loader".

It can live in literally any file that can be treated as a loader (exports load, resolve, etc.) This code is the relevant block:

const defaultLoader = 'internal/modules/esm/default_loader';
this.addCustomLoader(`node:${defaultLoader}`, require(defaultLoader));

So that if we ever expose a list loaders or anything like that the url property is what you see above.

}

#rebuildChains() {
this.#rebuildChain('globalPreload');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s wait to land this after #49144.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me 🎉

Previously we had one array for each kind of chain in the variable
`#chains`; this is probably the most efficient, but it decouples the
connection between the original `register` call and the hook functions
themselves.

By adding `#loaderInstances` we can keep track of each time `register`
is called. This allows, for example, trivially removing the loader later
on, or associating shared context with a particular `register` call. The
tradeoff here if we just did this would be that during hook traversal we
may iterate over a slightly larger number of objects. Instead we keep
`#hooks` but it is now a derived property; every time `#loaderInstances`
changes we simply rebuild `#hooks`.
Copy link
Contributor

@JakobJingleheimer JakobJingleheimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the code here is less readable and more convoluted in execution than it was before. I think my proposals will help, but I'd need to see them all together in the diff.

Note that you basically cannot use for…of or native object methods at all due to the risk from prototype pollution. See https://github.com/nodejs/node/blob/main/doc/contributing/primordials.md

#rebuildChain(name) {
const chain = this.#chains[name] = [];
let i = 0;
for (const instance of this.#loaderInstances) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't use for…of :( gotta use a plain ol' for

Copy link
Contributor

@JakobJingleheimer JakobJingleheimer Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frowning about no for…of

💯 😢 😆

@@ -87,45 +86,40 @@ let importMetaInitializer;
// [2] `validate...()`s throw the wrong error

class Hooks {
#chains = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove these? If you need them to start out empty, you can still leave the fields (resolve etc) and then also the code docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that the built-in hooks are not treated as any kind of "special" object; they now work just like any other loader – you can see them being added in the constructor. This ensures that if we attach any data to a loader instance that the default node loader gets it too – right now it's just url.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. I figured that :) I think you can still leave everything else here and just initialise the lists as empty 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes true 😄 I can do that. Part of me considered that hard-coding the hook names in a bunch of places might be less clean… but it's more clear and that works for me 🎉

Comment on lines 784 to -796
const { hookName } = meta;
const {
fn: hook,
url: hookFilePath,
next,
} = current;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did anything change here? It looks like it's now just less readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The previous structure of chain:

type Fn = (arg0: any, context: any, next: Fn) => any;

interface ChainItem {
  fn: Fn
  url: string;
}

type Chain = ChainItem[];

The new structure of chain:

type Fn = (arg0: any, context: any, next: Fn) => any;

interface LoaderInstance {
  load?: Fn;
  resolve?: Fn;
  initialize?: Fn;
  url: string;
}

interface ChainItem {
  fn: Fn
  loader: LoaderInstance;
}

type Chain = ChainItem[];

if (typeof instance[name] !== 'function') {
continue;
}
chain.push({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta use primordials for these sorts of things ArrayPrototypePush.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp thanks 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
chain.push({
ArrayPrototypePush(chain, {
__proto__: null,

}
}

#rebuildChains() {
Copy link
Contributor

@JakobJingleheimer JakobJingleheimer Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a performance loss. I think it would be better for there to be only rebuildChains and for it to accept a list of chain names to rebuild, like

#rebuildChains(...names) {
  if (!names.length) {
    names = new Array('initialize','load','resolve'); // [1]
  }

  const shouldRebuild = { __proto__: null };

  for (let n = names.length - 1; n > -1; n--) {
    const name = names[n];
    shouldRebuild[name]: true;
    this.#chains[name].length = 0; // [2] Might need a primordial?
  }

  for (
    let i = 0,
        l = this.#loaderInstances.length - 1;
    i < l;
    i++
  ) {
    const {
      initialize,
      load,
      resolve,
    } = this.#loaderInstances[i];

    if (shouldRebuild.initialize) { /* … */ }
    if (shouldRebuild.load) { /* … */ }
    if (shouldRebuild.resolve) { /* … */ }
  }
}
this.#rebuildChains('resolve'); // Rebuild only `resolve` chain
this.#rebuildChains('load','resolve'); // Rebuild `load` and `resolve`
this.#rebuildChains(); // Rebuild all chains

[1] I believe this (new Array(…) instead of […]) will invoke an optimisation in V8 to choose the most appropriate specialised C++ array from the start and also to pre-allocation only exactly the space for these 3 items. If it doesn't, no need for the otherwise uncommon use of new Array() instead of [].

[2] We don't need to re-create the array, only empty it. For that, setting its length to 0 is far better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this happen 😄 Probably a bit of a µ-optimization considering how often #rebuildChains is called and the size of N… but it's not a problem to make it a little bit faster. If the actual chain execution was affected I would be more concerned, considering that is called on every import/require.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, it likely won't break the performance bank. But it's easy to avoid the de-op, so why not 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants