-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for a simple, universal module loader hooks API to replace require() monkey-patching #52219
Comments
cc @nodejs/loaders |
Having synchronous hooks could also be interesting for 3rd-party "Node.js Resolver" implementations, for which switching their API from sync to async just for loaders is a challenge (ex wooorm/import-meta-resolve#10 (comment)). |
APM vendors would very much like this. We do unspeakable things with import-in-the-middle to make it look similar to require-in-the-middle, and all because ESM loaders were unwilling at the time to give us a way to just patch actual objects rather than only rewriting code text. |
Another use case for the universal hooks: users can create packages to e.g. track module information (like #52180) especially in the context of debugging dual packages, or develop tooling for other sorts of module diagnostics e.g. finding duplicate dependencies, cycles. These all require hooks that are supported by both loaders. |
If I’m remembering correctly, part of the theory of the current hooks design was the expectation that the ESM loader would eventually become the only loader, as it can handle everything including CommonJS: #50356. And the hooks would need to be off-thread because they would need the Atomics syncification stuff in order to support the sync I guess my question is, what’s the longer-term vision for maintaining the modules code? I find the current modules codebase borderline incomprehensible, and they’re not two separate loaders; the CommonJS loader calls into the ESM loader for And if so, how does this proposal fit into that plan? I don’t see these hooks as necessarily contradicting such a goal, but I think we should design them in such a way that they’re complementary: they should apply to ESM loader-managed modules as well as CommonJS loader-managed modules, for example. And we should probably add this |
Probably related is that I had previously attempted to add diagnostics_channel events to the module loading lifecycle, which could have been patchable through subscribers. That ended up stalling though as loaders was just at the start of the off-threading change so too much was changing for me to be able to keep up with my PR at the time. I think we don't actually need a new and complicated system, we just need some diagnostics_channel hooks at the right time to intercept the exports object and rebuild it. |
I think that can still be a long term thing, but deprecating
Writing in ESM syntax is not the same as running ESM syntax. Many of the dependencies of
I think technically the synchronous hooks are just lower level, more powerful hooks that
We could but then being off-thread makes it somewhat useless. For example if the module exports a function, then you just can't send a function (a closure) over to a different thread and let the other thread wrap the actual closure somehow, because closures are not serializable. I think that was what #52219 (comment) was talking about. They would then just need to hack their way by modifying the source code that will be compiled and invoked to generate these closures on a different thread (which has nothing to do with |
I don't think that's true, I think the design of ESM (the specification, not the Node.js ESM loader) provides a lot of room for an implementation with optimal performance. It doesn't need to unconditionally do more things. All the extra things it need to do are subject to what the module graph looks like and it's possible to provide a fast happy path that can be hit by maybe >80% of the pure ESM packages you can find on npm. And I think synchronous loader hooks can be an important piece in this as it also allows user-land hooks to participate in a fast happy path, instead of having no way to opt out of a slow abstraction that they don't necessarily need or could do better themselves. It also paves the way to fully deprecate |
Is the “slow abstraction” the fact that the hooks are async, or that they’re off-thread? If it’s that they’re async, I guess the question then is how do you handle when hooks do need to be async. If it’s that they’re off-thread, I don’t think we should assume that the hooks thread necessarily makes things slower. In nodejs/modules#351 (comment) it was estimated that spawning the thread incurs about a 10ms cost, but that once the thread exists the overall loading might actually be faster than a fully main thread approach because many CPU-intensive tasks like transpilation would benefit from running in a separate thread concurrent with the application code running in the main thread. And users might be able to achieve similar goals themselves by having their hooks spawn a separate thread, but it’s much more difficult to orchestrate and multiple customization libraries wouldn’t be likely share the same hooks thread, so there are benefits to Node setting this up for them. That’s all not to say that we can’t have both, but of course shipping both then causes the UX complexity of essentially two APIs for doing the same thing. |
I think I've repeated this many times in the OP - hooks can spawn their own workers. They can and they already do (e.g.
I would say it's a double-edge sowrd, by trying to take this over we are introducing footguns like #50948 or #47615. It's fine if they can work with the footgun. But I don't think it's fine to make that the only option we provide to our users, and when they come back to us, we provide no actionable mitigations other than documenting the footgun or telling them to not do what they need to do or their dependencies need to do, even though the use case can be simply solved by us not trying to be clever and just giving them an option with fewer abstractions.
The complexity already exists, as I've explained in the OP. Users still need to provide both CJS loader hooks via monkey patching and ESM loader hooks via |
They shouldn’t, though, when the ESM loader handles the entry point, because the current hooks handle This still leaves the question of whether the current hooks are somehow incomplete in terms of whether they can do everything that CommonJS monkey-patching can do, but that’s not something that necessarily needs to be solved by creating a whole alternate approach to hooks. And this all isn’t to argue that there aren’t advantages to the proposal, as you list many in the top post and many are still valid. But I don’t think that this point isn’t necessarily one of the benefits. |
Or libraries that haven't started implementing the I also think the switch only does more harm than good with the current shape of the ESM loader, and that shouldn't be used as an existing or imminent condition. The existing condition is that the ecosystem still relies on |
If I'm understanding the spec right, we still have the problem of import bindings being immutable, so patching the exports object would not work? |
One solution would be to create a module facade via vm.SynthethicModule (currently behind —experimental-vm-modules), then you can re-export them with wrapped implementations. That’s what we do to export the builtins as ESM (with an internal version of it) with an added default export. |
Also, presumably the hook would run before the module is linked, right? So the exports could get patched by the hook before they’re registered in V8. They would be immutable after that point, aside from techniques like @joyeecheung mentions, but this pre-linking customization should cover a lot of use cases? |
Would pre-link work? Basically we need to be able to get a reference to a current exported function or class and replace that export with another that calls the original internally. If we aren't linked yet, can we even get the references? |
If you are talking about export hooks, then it needs to happen after evaluation (that's the point when we have the actual namespace after executing the module code with V8). Linking is the phase where |
I like the general idea proposed here 🙂 Jotting down a few points, which may be invalid or a pipe-dream:
I think that is not completely accurate: I believe it's not that we rejected the idea, but merely that it is not yet possible/feasible (more work would be needed to facilitate it), so it was out of scope of the cited PR. My vague assumption when I imagine modules in the future is that it actually would be this way (but perhaps that's naïve of me). Also, I'm not married to the off-threading design (but I will cry a lot a lot if it's ripped out after so much blood, sweat, and tears). Off-threading is undeniably a huge complication and maintenance burden. If there's an alternative that achieves the crucial bits that provided, let's hear it! |
I would say without this proposal doing any substantial changes to the CJS loader would be unrealistic. We cannot get rid of the CJS loader while tens/hundreds of millions of downloads per week on npm rely on patching its underscored methods. We should at least find a way to reduce that to about <1 million of downloads per week before breaking the user-accessible part of the CJS loader, and this proposal hopefully does that.
I think that's a nice to have though I suspect it could just make maintainability worse, not better, due to the existing complexity of the hooks.
I suppose by lands you mean stage 4? That seems pretty far-fetched. ECMA262 proposals don't usually go to stage 4 within a year, especially when they are not a simple helper function. I would estimate something like module mocks to take 2-3 years to go to stage 4 (similar to how long it took for TLA), and leaving CJS monkey-patching in the ecosystem for 2-3 more years doesn't sound like a win for anyone. One a side note, I think we should learn our lessons in "why
I think it still serve an important use case for the users (being able to do async work in the loader out of the box), and I don't think it needs to be ripped out. The proposal here is complementary and just provides a lower-level API that allows more customization and more optimization in the module loading process. The off-thread hooks can just be a helper that saves users the trouble of spawning their own workers if they do need them but also don't need control over them. |
This still means blocking the main thread even in cases where the main thread could continue to run, notably dynamic import would result in pausing the whole main thread even though other things could run. Using userland threads doesn't help as It wouldn't be that complicated to simply have mirrored hooks for sync and async which are chosen depending on whether addHooks({
// Called for require
resolve(specifier, context, nextResolve) { ... }
// Called for (dynamic) import, falls back to sync resolve if not provided
resolveAsync(specifier, context, nextResolveAsync) { ... }
// Ditto for load
}); |
I think what we want in the end is this:
It sounds like this proposal can achieve all of this, with the significant caveat that the new hooks would be sync instead of async. So if we want the simplicity of a single set of hooks that work everywhere, we would get rid of the current off-thread async hooks. I’m okay with jettisoning the current off-thread hooks. They’ve proven difficult to maintain, and if most users don’t need the ability to run async hooks to customize |
I think the most aggressive(?) notification before full breakage would be deprecation warnings even inside node_modules (and in the warning, asking users to reply to a tracking issue about their use case maybe?). That can be done in the duration of an LTS to make sure it get enough attention. Technically we don't need to break them all at once, just emit warnings for individual patched methods as we go and leave some time to the feedback cycle before we make breaking changes. The CJS loader is no stranger to this "emit warning if foo is patched and keep the patched foo work in a hacky way, otherwise use a different implementation of foo" pattern. |
Would the proposed solution negate the need for additional Node args like Additional Node args are a pain point for a variety of reasons:
My understanding was that ESM imports are all evaluated first before any other code is run and that is why we're required to use |
The current hooks can be used without a flag, that’s what import { register } from 'node:module'
await register(new URL('./hooks.js', import.meta.url)) // Note the await here
await import('./app.js') // Note the dynamic import here It would be similar for the new hooks. |
While this workaround is viable in a lot of use cases, many of the most popular meta frameworks do not expose an entry script where you could add this async import. It is possible to code your own entry point for many of them but you can break compatibility with their tooling so it's often not documented or recommended. It's clear that in the long term we should be encouraging frameworks to include hooks that allow us to run code before the app code is async imported. However today, the only options for hooking ESM are using the |
I have a POC in https://github.com/joyeecheung/node/tree/sync-hooks - pretty sure it's full of bugs still and there are a bunch of TODOs, but at least it can work with a mini typescript transpiler in the test. Working on a more concrete API design and planning to PR the API doc to the loaders repo (?). I suppose the export hooks will be left to a later iteration. |
After an in-depth review of the loaders code, I support this proposal. |
Updated the WIP a bit and made a whacky /**
* @typedef {{
* parentURL: string,
* }} ModuleResolveContext
*/
/**
* @typedef {{
* url: string,
* format?: string
* }} ModuleResolveResult
*/
/**
* @param {string} specifier
* @param {ModuleResolveContext} context
* @param {(specifier: string, context: ModuleResolveContext) => ModuleResolveResult} nextResolve
* @returns {ModuleResolveResult}
*/
function resolve(specifier, context, nextResolve) {
const resolved = nextResolve(specifier, context);
if (resolved.url.endsWith('.esm')) {
return {
...resolved,
format: 'module'
};
}
return resolved;
}
/**
* @typedef {{
* format?: string,
* }} ModuleLoadContext
*/
/**
* @typedef {{
* format?: string,
* source: string
* }} ModuleLoadResult
*/
/**
* @param {string} url
* @param {ModuleLoadContext} context
* @param {(context: ModuleLoadContext) => {ModuleLoadResult}} nextLoad
* @returns {ModuleLoadResult}
*/
function load(url, context, nextLoad) {
const loaded = nextLoad(context);
const { source: rawSource, format } = loaded;
if (url.endsWith('.ts')) {
const transpiled = ts.transpileModule(rawSource, {
compilerOptions: { module: ts.ModuleKind.NodeNext }
});
return {
...loaded,
format: 'commonjs',
source: transpiled.outputText,
};
}
return loaded;
}
/**
* @typedef {{
* format: string,
* }} ModuleExportsContext
*/
/**
* @typedef {{
* exports: any,
* }} ModuleExportsResult
*/
/**
* @param {string} url
* @param {ModuleExportsContext} context
* @param {(context: ModuleExportsContext) => {ModuleExportsResult}} nextExports
* @returns {ModuleExportsResult}
*/
function exports(url, context, nextExports) {
const { exports: exported } = nextExports(exports, context);
const replaced = { ...exported, version: 1 };
return { exports: replaced };
} I will write a more serious doc and either PR that along side with the code here, or into https://github.com/nodejs/loaders/tree/main/doc/design (depending on how serious that doc turns out to be?) |
I think either works. You could make a new markdown file for the loaders repo alongside the other design docs, or write API docs in a Node core PR (with or without the implementation). The loaders repo design doc might be more useful as it can go into implementation details and discussion that we might not want to include in user-facing documentation; but then we’d have to write it all up again when writing the eventual user docs. |
Does that exports function handle ESM too? Or have you just tested against CJS for that so far? Ideally I would like to have a unified patching interface for both CJS and ESM, but the immutability aspect made that quite a challenge to do externally with IITM. I'm hoping we can find a way around that with a built-in sync API. Was your thinking that within that function the user would build the vm.SyntheticModule facade we talked about before themselves and only swap out the exported object there? Maybe doable to leave that work up to the user, though I had assumed an interface for that would be provided. 🤔 |
Something else we discussed was implementing |
I have got the WIP working for ESM yet, but actually I noticed that for ESM what you need is still function load(url, context, nextLoad) {
if (!shouldBeWrapped(url)) return nextLoad(url, context);
if (context.format === 'module') {
const { module: originalModule } = nextLoad(url, context);
const keys = Object.keys(originalModule.namespace);
const m = vm.SyntheticModule([keys], () => {
for (const key of keys) {
let value = originalModule.namespace[key];
if (key === 'foo') {
value = wrap(value); // Wrap exports.foo
}
m.setExports(key, value);
}
});
// Node.js will swap the original module in the loader cache with the returned one, and run
// the evaluation callback after the evaluation of the original module is completed.
return { module: m };
}
// For CommonJS modules, it's recommended to only replace
// properties instead of replacing the entire object to avoid
// getting out of sync if the original module accesses
// `exports.foo` internally directly
if (context.format === 'commonjs') {
exported.foo = wrap(exported.foo);
return { exports: exported };
}
// unreachable?
} If live binding of unwrapped values are necessary (i.e. if the original module modifies the exported value, you want that modification to still reflect for user code importing that original module), const { module: originalModule } = nextLoad(url, context);
let source = `import { wrap } from 'util';`;
for (const key of originalModule.namespace) {
if (key === 'foo') {
source += `import { foo as originalFoo } from 'original';`;
source += `export const foo = wrap(originalFoo);`;
} else {
source += `export { ${key} } from 'original';`; // Export unwrapped values with live binding
}
}
const m = vm.SourceTextModule(source);
m.linkSync((specifier) => {
if (specifier === 'original') return originalModule;
if (specifier === 'util') return util; // Contains a synthetic module with the wrap method
});
// Node.js will swap out the original module in the loader cache with the returned one.
return { module: m }; |
Or I think we can introduce a link hook for ESM, and import-in-the-middle would probably look like this: function link(url, context, nextLink) {
if (!hasIitm(url)) return nextLink(url, context);
if (context.format === 'module') {
const { module: originalModule } = nextLink(url, context);
const util = new vm.SyntheticModule(['userCallback', 'name', 'basedir'], () => {
util.setExports('userCallback', userCallback); // Or a bigger callback folding all the added user callbacks
const stats = parse(fileURLtoPath(url));
util.setExports('name', stats.name);
util.setExports('basedir', stats.basedir);
});
let source = `import * as original from 'original';`;
source += `import { userCallback, name, basedir } from 'util'`;
source += `const exported = {}`;
for (const key of originalModule.namespace) {
source += `let $${key} = original.${key};`;
source += `export { $${key} as ${key} }`;
source += `Object.defineProperty(exported, '${key}', { get() { return $${key}; }, set (value) { $${key} = value; }});`;
}
source += `userCallback(exported, name, basedir);`;
const m = vm.SourceTextModule(source);
m.linkSync((specifier) => {
if (specifier === 'original') return originalModule;
// Contains a synthetic module with userCallback, name & basedir computed from url
if (specifier === 'util') return util;
});
return { module: m };
}
} |
Opened nodejs/loaders#198 because I found some higher level design questions |
Spinning off from #51977
Background
There has been wide-spread monkey-patching of the CJS loader in the ecosystem to customize the loading process of Node.js (e.g. utility packages that abstract over the patching and get depended on by other packages e.g. require-in-the-middle, pirates, or packages that do this on their own like tsx or ts-node). This includes but is not limited to patching
Module.prototype._compile
,Module._resolveFilename
,Module.prototype.require
,require.extensions
etc. To avoid breaking them Node.js has to maintain the patchability of the CJS loader (even for the underscored methods on the prototype) and this leads to very convoluted code in the CJS loader and also spreads to the ESM loader. It also makes refactoring of the loaders for any readability or performance improvements difficult.While the ecosystem can migrate to produce and run real ESM gradually, existing tools still have to maintain support for CJS output (either written as CJS, or transpiled from ESM) while that happens, and the maintenance of CJS loader is still important. If we just provide a universal API that works for both CJS and ESM loading customizations, tooling and their users can benefit from a more seamless migration path.
Why
module.register()
is not enoughThe loader hooks (in the current form,
module.register()
) were created to address loading customization needs for the ESM loader, so they only work when the graph is handled by the ESM loader. For example it doesn't work when therequire()
comes from a CJS root (which could be a transpiled result), or fromcreateRequire(import.meta.url)
. Addressing existing use cases in the CJS loader is not in the scope of the loaders effort, either (it was brought up before was dismissed in a previous PR too). Therefore tooling in the wild still have to maintain both monkey-patching-based hooks for CJS and loader hooks for ESM when they want to provide universal support. Their users either register both hooks, or (if they know what format is actually being run by Node.js) choose one of them.The
module.register()
API currently forces the loader hooks to be run on a different worker thread even if the user hooks can do everything synchronously or need to mutate the context on the main thread. While this simplifies de-async-ing of loader code to some extent when they want to run asynchronous code in loader hooks, the value is lost once the loader needs to provide universal support for CJS graphs and has to maintain synchronous hooks for that too (and in that case, they could just spawn their own workers to de-async, e.g. what @babel/register does on top ofrequire()
monkey-patching).For tooling that only has CJS support via
require()
monkey-patching, if they want to add ESM support, this unconditional worker abstraction as the only way to customize ESM loading makes wiring existing customizations into ESM more complicated that it needs to be. The move to unconditional workers also lead to many issues that are still unaddressed:Worker
optionexecArgv
in v20 #47747 and Loaders that use childProcess.fork lead to endless recursion of processes #47615 and this just looks like a rabbit holeFor us maintainers, having to support this worker setup as the only way to customize module loading also adds maintenance burden to the already convoluted loaders. It is already difficult to get right in the ESM loader (e.g. having to doge infinite worker creation or having to figure out how to share the loader worker among user workers), let alone in the monkey-patchable CJS loader.
Proposal of a synchronous, in-thread, universal loader hooks API
As such I think we need something simpler than
module.register()
that:require()
monkey-patching-based hooks.require()
monkey-patching sooner.This becomes more important now that we are on a path to support
require(esm)
and want to help the ecosystem migrate to ESM by providing a path with backwards-compatibility and best-effort interop, instead of providing features that does not work in the existing CJS loader or goes against existing CJS usage patterns, making it difficult for people to migrate.I propose that we just add a synchronous hooks API that work in both the CJS and the ESM loader as a replacement for the monkey-patchability of
require()
. The API can be something like this - this is just a straw-person sketch combining existingmodule.register()
APIs and APIs in npm packages like pirates and require-in-the-middle. The key is that we should keep it simple and just take synchronous methods directly, and apply them in-thread.:The main difference between this and
module.register()
is that hooks added viamodule.register()
are run on a different worker unconditionally, whilemodule.addHooks()
just keeps things simple and runs synchronous hooks synchronously in-thread. If users want to run asynchronous code in the synchronous hooks, they can spawn their own workers - this means technically they could just implement whatmodule.register()
offers on top ofmodule.addHooks()
themselves. Somodule.register()
just serves as a convenience method for those who want to run the code off-thread and prefer to delegate the worker-atomics-wait handling to Node.js core.In a graph involving real ESM,
module.register()
can work in conjunction tomodule.addHooks()
, the hooks are applied in the same order that they are added in the main thread. In a pure CJS graph,module.register()
continues to be unsupported, as what's has already been happening. Maybe someday someone would be interested in figuring out how to makemodule.register()
work safely in the CJS loader, but I think the burden from the handling the unconditional workers is just not worth the effort, especially when users can and already do spawn their own workers more safely for this use case. IMO a simple alternative likemodule.addHooks()
would be a more viable plan for universal module loading customization, and it gets us closer to deprecatingrequire()
monkey-patching sooner.Migration plan
Tooling in the wild can maintain just one set of synchronous customizations, and handle the migration path by changing how these customizations are wired into Node.js:
require()
-monkey patching, and into ESM viamodule.register()
. This is unfortunately what they already do today if they want to provide universal module support.module.addHooks()
. There is no longer need to maintain two wiring for universal support of CJS and ESM. And, if they didn't support real ESM before, they get to implement ESM support relatively simply by just migrating fromrequire()
monkey-patching tomodule.addHooks()
.module.addHooks()
, they can remove dependency onrequire()
monkey patching completely.For us, the migration plan looks like this:
module.addHooks()
as a replacement forrequire()
monkey-patching, and make it wired into bothrequire()
/createRequire()
from the CJS loader andimport
/require
inimport
ed CJS from the ESM loaderrequire()
monkey patching and actively encourage user-land packages that rely on patching to migrate. At the mean timerequire()
monkey patching will still work to some extent in conjunction with the new loader hooks, so that packages have a graceful migration period.require()
monkey patching drop enough in the ecosystem, start emitting runtime warnings when the internal properties ofModule
are patched, and suggesting to usemodule.addHooks()
.require()
monkey patching to work. Packages who monkey-patchModule
but don't manage to migrate might still work with newer versions of Node.js - until we do internal changes to the internal properties that they rely on. When we do that and break them, instead of further convoluting internals to make patching work, we'll suggest them to just usemodule.addHooks()
on newer versions of Node.js.The text was updated successfully, but these errors were encountered: