-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support registering worker specific custom ESM loader hooks (Regression from #52706) #53195
Comments
@nodejs/loaders |
@alan-agius4 Thanks for this. I’m a little confused by your example, because it has Aside from the bug introduced by #52706, the intended design is that hooks registered on the main thread would apply to all worker threads automatically. From your example, I gather that this is the API you want, but somehow you get the differing context of each thread available to the hook? So like I see the differing export async function load(url, context, next) {
const { files } = context.workerData // Either Italian or English, depending on the thread of the module
// ... |
Yes, we intend to use the same hook logic but with different contexts/data. Before version 22.2, this was feasible as each thread independently registered its own hook with different register.js register('./loader-hooks.js', {
parentURL: pathToFileURL(__filename),
data: workerData,
}); loader-hooks.js let outputFiles;
export function initialize(data) {
outputFiles = data.files;
}
export function resolve(specifier, context, nextResolve) {
// resolve logic here
} app.js const workerA = new Worker("./lib/child.js", {
workerData: { files: filesInMemoryInItalian },
execArgv: ["--import", "./register.js"],
});
const workerB = new Worker("./lib/child.js", {
workerData: { files: filesInMemoryInEnglish },
execArgv: ["--import", "./register.js"],
});
I understand this and it's acceptable. However, I'm unsure if it's a bug or not, but with version 22.2, you can opt-out of this behavior by setting const workerB = new Worker("./lib/child.js", {
execArgv: [],
}); Questions
|
I don't know; we haven't decided yet.
If we decide that calling
This is essentially the problem introduced by #52706. It wasn’t on our radar that hooks could be applied to (or removed from) worker threads via Right now we just want to unbreak people so that we can take some time to think about how to handle hooks for worker threads. The off-thread hooks code is already monstrously complex, so I worry about the maintainability of something so specific as “if Something else to consider is, what are you using worker threads for here? It seems like their purpose is a way to group sets of modules to be customized in a particular way. You probably don’t have easy access to edit the source code of the files you’re loading, but if you did, you could use import attributes for that purpose: Another option is that we have an old PR for a Maybe none of these options are preferable to the isolation that worker threads provide, but my point is just to try to define the problem in a way that doesn’t prescribe a particular solution. As in, I assume that the threads are an implementation detail to whatever you’re ultimately trying to achieve. Maybe they are the best solution, better than child processes or import attributes or query strings or other ideas, but for the purposes of evaluating approaches it would help to define the goal as naively as possible when it comes to how it should be achieved. |
I think there are four cases we need to discuss:
I think that in order to support those cases, we need to have an option in the |
@alan-agius4 based on your diagram, it would be advantageous for you to have one loader thread for all |
@mcollina, correct. |
One thing that’s come up while working on these PRs is that it’s very hard to reason about and debug multiple hooks threads. There’s also the question of how to manage them: do they go away when all threads using them exit, or stick around for potential future worker threads to use? We’d end up needing to build APIs for orchestrating these hooks threads. What about the API I proposed in #53195 (comment), of passing in |
I'm not sure that would support having workers with different loader hooks. |
I'm currently working on https://github.com/ShogunPanda/node/tree/multiple-chains (based off #53200). Basically the idea is that each thread at any given time has a
To this:
Each application thread can OPTIONALLY have a different chain of hooks (by default it inherits its parent one or starts a new one if none is found) Every time a thread needs to change the hooks data, it can start a new chain. The chain will be present in initialize, resolve and load context arguments (initialize will gain the second parameter compared to today's situation) so they can store data locally in the global thread without conflicts. Do you think it will fit your use case? 17:54 |
It wouldn’t, but I don’t think we need that complexity. Within the hook itself it can handle the various cases: export function resolve(specifier, context, next) {
if (context.workerData?.onThreadA) {
doWhateverShouldBeDoneForThreadA() This is no less performant than us doing a similar switch in internal code to determine which hook chain to run. |
I have not yet analyzed the details of the solution but as a first thought on this fwiw I think that having one single chain and calling the hooks with all the needed information for the hook to decide if it does something or calls I pushed on #53200 one commit today that starts to implement a solution in this direction. In fact with that one the cases needed by the angular regression should already work. Tests are missing, it is WIP but the idea should already be visible. We need a way to automatically unregister hooks. Still working on that part. once again the disclaimer: not advocating, just proposing. |
@dygabo can you explain this a bit? How would the Angular case be solved by what’s currently on #53200? Presumably their existing code won’t just start working again, so what would their rewritten hooks look like so that they can get the desired behavior of having the Italian translations available as the context data for modules imported from one thread and the English translations available as the context data for modules imported from another thread? |
@GeoffreyBooth that is still WIP and it does not have all the data yet. The idea would be that the hook when it runs, it knows for what it was created (creation time Also once that works, tests need to be added and more cases tested. So I don't think it is realistically possible to have a finished #53200 by |
|
What is the problem this feature will solve?
Following the implementation of #52706 in Node.js version 22.2, workers can no longer specify custom ESM loader hooks. In the Angular CLI, we utilized this feature as part of our prerendering (SSG) pipeline.
Here’s a summary of our build pipeline and process: In the main process, the application is transpiled and bundled, generating all files in memory. The main process then spawns multiple thread workers (using Piscina) with a customized loader hook to intercept imports and redirect them to the in-memory files when present. As of version 22.2 this is no longer possible.
It's worth noting that in our scenario, multiple threads (e.g., Thread A and Thread B) may be initialized, resulting in varying module resolutions and ESM module hooks across these threads.
In the following example, the
workerData
contains the files and modules utilized by the ESM loader hooks:What is the feature you are proposing to solve the problem?
Thread workers should be permitted to specify custom ESM loader hooks.
What alternatives have you considered?
An alternative approach is to spawn a forked process from the main process, specifying the custom loader hook in the forked process. This forked process then spawns the worker threads. As demonstrated in angular/angular-cli#27721.
However, It is important to note that this approach is not backward compatible. In previous versions, the ESM loader specified on the main thread did not propagate to child workers. Consequently, from a maintenance perspective, tools now need two methods to use ESM custom loaders with thread workers and need to support Node.js LTS and Active versions.
The text was updated successfully, but these errors were encountered: