-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Move ESM loaders to worker thread #31229
Conversation
Squashed and rebased to pull in the new hooks (format, source). It's not broken right now (builds & URL resolve in hook works) but the new hooks haven't been fully ported. Hopefully I'll make some progress on that tonight. |
My main early feedback on this would be that if we do this, we should allow environment attenuation through an attenuation hook, since currently loaders are able to do attenuation themselves since they are in the same context. For example, instrumenting globals, etc. One API for this might be something like - export function environmentInit () {
return `
Object.customProperty = function () { console.log('custom') };
`;
} where the loader basically returns a string to evaluate in the target environment before any modules are loaded. If we don't do something like this loaders that want this would need to have eg an import that is always added for every single module to get the same feature, which would be an unfortunate workaround to force I think. |
That feels like it overlaps with P.S.: I'm leaning towards "prepare the execution context" ( |
Allowing loaders to specify their own modules to preload is a viable alternative, yes. |
Would this extend to other aspects as well? E.g. should a loader be able to set a policy manifest or influence other restrictions traditionally set up using existing CLI flags? |
I do think loaders are a good place to direct further security mechanisms. Better policy integration would be nice to see too. |
Alright, basic As is I was moving to the remote object pattern, I also made some small adjustments to the protocol. It now matches the v8 inspector naming ( |
I think there are a lot of missing hooks right now we can discuss in other PRs that could even block this PR. This PR however seems complex enough already without adding more hooks to it. Does that seem fine to split up that work @guybedford ? |
const { defaultResolve } = require('internal/modules/esm/resolve'); | ||
const { defaultGetFormat } = require('internal/modules/esm/get_format'); | ||
|
||
const defaultLoaderWorker = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably shouldn't be called a "worker"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong preference apart from "anything but loader". defaultLoaderHooks
? defaultLoaderHandlers
? defaultResourceSource
? defaultResourceProviders
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference for "resource provider" after writing up that list.
I see now that this was about the additional hooks for env setup. Sorry, shouldn't comment before waking up. :) |
There's two failing tests left that both are regressions in error output:
The error messages are reported properly but the stack is much more verbose and at the same time doesn't contain the proper source quote. Example: - file://*/test/fixtures/es-module-loaders/syntax-error.mjs:2
- await async () => 0;
- ^^^^^
-
+ events.js:298
+ throw er; // Unhandled 'error' event
+ ^
SyntaxError: Unexpected reserved word |
Added a call to decorate the errors which doesn't fix that the stacks are super verbose but at least the helpful source snippet is back. |
I feel that I also think dynamicInstantiate removal deserves it's own entry in the changelogs so I assume this means separate commit? |
@coreyfarrell a separate PR would be doable, but it likely should get a docs deprecation as we don't have a full replacement for some behavior of attenuating globals in the same way. |
@bmeck I'm not sure what you mean by attenuating globals? I think docs deprecation of dynamicInstantiate would be better than nothing but if we know for a fact that it's going to be removed and not replaced then shouldn't we remove ASAP? |
I had looked into a docs deprecation and afaict that already happened. At least the master markdown has a prominent warning: https://github.com/nodejs/node/blob/master/doc/api/esm.md#dynamicinstantiate-hook. The only adjustment may be to change "may disappear" to "will disappear". @bmeck Another way of reading your comment: We should add a new API for attenuating globals asap to unblock removing |
FWIW my personal take on this is: There's already |
@jkrems I'm somewhat in agreement as long as we can agree that using both is a stopgap that needs to be filled eventually |
I was thinking of the docs deprecation as entirely removing this section from the docs, to make |
@guybedford they can work out their own threading models, but spinning up a worker after the loader starts up is more costly than spinning up all the threads at the same start time. If they want to do everything in a single thread they can always just import the other loaders like you see with things like https://github.com/targos/multiloader does. Doing the inverse isn't the same since you have to wait on each step to spin up a new loader in a linear rather than parallel manner. |
@bmeck this feel like big assumptions to me about the usage, when firstly we know nothing about the usage and secondly we know nothing about what would optimize any given usage. Regardless of these arguments though, having a thread per loader in a multiple loader scenario will definitely affect the performance of the simple scenarios of low-compute hooks. Penalizing the base case based on heavy assumptions about the performance gains on other cases is quite a leap and I would be against such an assumption. |
@guybedford the base case is unclear to some extent like you stated in the first part of your comment. The usage of eagerly allocating early rather than requiring extra startup time prior to spinning up another worker seems to allow your single threaded compute and allow for customized management by using import instead of a CLI flag to specify they want to keep a loader in the same thread. |
@bmeck multiple loaders indicate a compositional model that is native to Node.js - I don't think expecting users to use a userland loader to achieve this same composition without incurring the cost of multiple threads is a solution here. I could understand if the arguments were based on security or isolation here, but arguing that say 4 loaders (eg an http loader, mocking loader, coverage loader and resolver enhancing loader) requires 4 threads by default as a performance improvement seems rather odd. My main concern is that |
Note that this PR is not trying to allow for multiple loaders. Perhaps a
different place would be better for so specific of trade off discussion.
…On Wed, Jun 10, 2020, 9:34 PM Guy Bedford ***@***.***> wrote:
@bmeck <https://github.com/bmeck> multiple loaders indicate a
compositional model that is native to Node.js - I don't think expecting
users to use a userland loader to achieve this same composition without
incurring the cost of multiple threads is a solution here.
I could understand if the arguments were based on security or isolation
here, but arguing that say 4 loaders (eg an http loader, mocking loader,
coverage loader and resolver enhancing loader) requires 4 threads by
default as a performance improvement seems rather odd.
My main concern is that getSource in one loader will serialize and send
the source through all of the loaders incurring latency and serialization /
deserialization costs entirely unnecessarily, when we should be able to
handle this memory by reference.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#31229 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZJI6IGHDJAWJYHI4MGG3RWA7E5ANCNFSM4KDLABJA>
.
|
Agreed, and yes the concerns do not affect this PR in its current form. |
It seems to me that if we had to choose between chained loaders and loaders in a worker thread, we would prefer the former. Does anyone disagree? So maybe we should merge #33812 first after all, and this will need to be rebased/refactored to account for the addition of loader chaining, assuming we can figure out a way to use workers in a chained loaders setup. |
@GeoffreyBooth @bmeck Fwiw, I would like to be able to have an informed opinion about that, but I don’t really understand why we would bring Workers into this in the first place (and believe me, I am a big fan of Workers). The PR description references “older design docs and discussions”, do you have a link to any of that? |
@addaleax Please see nodejs/modules#351 (comment), I think @bmeck does a good job summarizing the reasons why there. See also the top of that thread for links to older design docs, this is probably the primary thread with most of the loaders discussion so far. There are a few other issues in https://github.com/nodejs/modules that discuss specific aspects of loaders, and #30986 has a lot of more recent discussion. |
@GeoffreyBooth Yeah, I see 👍 In that case one thing that I’d really like to see from this PR is to not enforce the fact that all Worker threads in a hierarchy share a single loader thread. It should keep being possible, and probably even the default, to start |
can you clarify why the default wouldn't be to share the same loader thread? It is much more difficult to setup the shared state / that mimics how things like ServiceWorkers share handling requests from multiple sources. also, one of the uses of loaders/audit hooks/etc. that we have talked about in various WG is to allow instrumentation of core entirely. I think it should be possible to prevent user code from escaping the constraints applied by those hooks. Does that match the constraints you have about allowing an escape hatch if a Worker needs to be spawned with a different loader? |
Because Worker threads are, fundamentally, fully independent Node.js instances, except that they can communicate with each other. That invariant is broken here.
I feel like this sentence if missing an “if we decide to do X”? Is this referring to Worker threads not sharing the same loader? To be clear, the most important thing to me is that this is optional. Being able to spawn workers with an independent loader mechanism is a must for me. That being the default is the most sensible choice in my eyes, but also one that can be debated for sure.
I don’t really know what you are referring to here, but I’m also not sure how Workers differ from child processes in this respect? |
Can you explain where this invariant is documented. To my knowledge this isn't true given things like policies which went out of their way to intertwine the 2 isolates. Workers are unlike child processes as they are strictly unable to be independent due to having shared address space, environment variables, OS permissions, cannot have differentiated sandboxing. I think a design invariant might need to be ironed out and documented but I don't agree that they are completely separated nor that they could ever be.
Yes, it would be difficult to share the same loader and/or part of the same loader chain if they are expected to create entirely new instances. It is also unclear why mandating a new instance is directly useful to me unless we want loaders to be stateful on a per context level and not on a group of contexts level. Given the progress of the Realms and Compartments APIs that seems a bit worrisome to myself at least since I would expect some capability of instrumenting those to affect loaders which would once again make them multi-context aware. Keeping default isolation also goes along with the difficulties you see in things like in https://github.com/nodejs/tooling/issues which often cover issues of needing to share state with APIs so that supplemental instances such as child processes and workers do not go out of sync with modifications of the spawning instance. Overall it feels there may be a need to write up a technical direction document on how to approach this kind of design space, but I am in no position or authority to do so and would not even know where to start except maybe asking the TSC to do so.
Workers and child processes are inherently different in the fact that workers cannot have differences such as address space, OS permissions, and sandboxing. Child processes are a much more discrete separation of the instances that do not need to deal with the multi-tenancy concerns that a worker must. |
@bmeck Let me be a bit clearer: As Node.js instances, Worker threads are independent in the same way that processes are. Of course Worker threads are not fully independent as OS-level threads, but that’s a strawman argument – child processes are not fully independent either because they still share a file system, networking stack, etc. In both cases, it makes sense to inherit the defaults of whatever parent instance spawned them, but if there can be flexibility, there should be. |
I do not believe so; I think you are carrying my argument to an extreme that any sharing is to be treated the same as undeniable sharing. Things like the effective user id do affect things like the filesystem and how it can be accessed, you can create a child process with a different user id for example. The limits of sharing are around at what the boundary sharing is being modeled, to me workers have a distinctly different boundary from processes which are distinct from containerized/sandboxed process groups and different from remote processes. I do not think we should treat all the boundaries as unable to be distinguished here as they do affect the deniable capabilities within each different type.
I don't believe that is entirely true but agree that flexibility is desirable; particularly, if you configure a specific setting with the mind that you are affecting the parent instance in a potentially constraining way such as a loader/permissions mode/policy configuration there can be some flexibility to escape this but it should be controlled and deniable. This falls in the same category to my viewpoint as not wanting a context which is denied |
I would note that, deniability doesn't necessarily need to be easy to do from my standpoint; however, it should be realistically feasible to do so without extreme effort. |
I’m not doing that, I’m just saying that you’re listing essentially things that Worker threads share that are essentially arbitrary and aren’t inherently relevant to the discussion about this particular PR.
Yes, I think we’re on the same page here. At the same time, I’m not convinced that loaders are a good choice for approaching deniability. |
I do think this is relevant since the discussion above was about not sharing by default and I see that as a strict downside as a default. So, I wanted to understand why it should be a default and this topic of what the potential intents/boundaries of workers came out of the statement about an invariant that I don't think is currently agreed upon; so I stated why I don't think that invariant is applicable as a whole.
My main point for why loaders should have deniability is that if loaders need to be able to apply the same affects as policies so that policies can do things like instrument the results of loaders with integrity that leads towards a model where policies are just configuration of the default loader in my view. Even if we are not talking about deniability via resolution, deniability by doing things like freezing parts of core is possible in loaders which allows emulation of --frozen-intrinsics while also keeping polyfilling as possible via Perhaps discussing if loaders should be able to fully integrate with security features and/or have the same abilities as a policy feature for Node.js is pertinent here. I think if they are aiming towards having the same overall capabilities then they should take a stance of deniable escape hatches for sharing rather than no-sharing by default because that is how I would expect security minded (not necessarily enforcing) mechanisms to function by default. |
I think from a deniability POV, it's about how easy it is to deny? E.g. a loader could intercept calls to the worker thread module and ensure that the program doesn't try to override the loader (or instrument a custom loader). But it does add another thing the loader would have to think of. I would consider the ability to generally create child contexts with different loader hooks important. It would be unfortunate if the only way to do it would be spawning new node processes. |
@jkrems I think we agree that we need the ability to spawn different worker contexts, but the question is about what defaults make sense for the intents of loaders and if there are other constraints. E.G. you cannot share a loader instance with a child process so that is out of scope, but for workers do they have the intent of acting as if they were child processes and if so what are the boundaries that differentiate them from processes. |
EDIT: Ah, misread. Where does the idea of sharing loader instances with child processes come from? That feels like another topic unless I'm missing something. Can you clarify this? It seems like the following should be possible, assuming the current context's loaders are in a thread: // inherit hooks from current thread, default
new Worker('./x.mjs', { moduleLoaderHooks: true })
// disable hooks from current thread
new Worker('./x.mjs', { moduleLoaderHooks: false })
// provide custom hooks for the new thread, creates a new loader hook thread
new Worker('./x.mjs', { moduleLoaderHooks: ['ts-loader'] }) I think that the name
That sounded like @addaleax agrees with inheriting hooks from the parent. Which seemed like your preference as well. So the only thing I added was that it's still possible to deny this API, even if it's configurable. So I'm not sure where there's still disagreement..? |
I think we just need to figure out the level of sharing that we achieve by default. We basically have 2 options and escape hatches would vary depending on which the default is:
In both cases we also need to figure out an escape hatch given these comments to allow creating a new (unrelated?) loader in a Worker. We are not currently in the comments seeking to prevent a loader from manipulating/swapping the default of that escape hatch. With #33812 this becomes more complex if each step of the chain is given its own thread vs not. I do think both these PRs are touching on a lack of consensus on a model for how we should be creating boundaries that would be helpful to iron out. |
this seems like it will be redone as a new PR after a meeting last week on the future of loaders |
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesThis has some widespread implications:
This roughly follows some of the older design docs and discussions from @nodejs/modules .
This does not seek to allow having multiple user specified loaders, nor is it seeking to change the loader API signatures, it is purely about moving them off thread and the implications of such.
This does introduce a new type of Worker for loading an internal entry point and also expands the worker_threads API for convenience by allowing a
transferList
in theworkerData
to avoid extraneouspostMessage
s.This will need quite a large writeup on how it works and how data is transferred but this seems a good point to start discussions.