-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
module: ensure successful dynamic import returns the same result #46662
module: ensure successful dynamic import returns the same result #46662
Conversation
Review requested:
|
8ec5944
to
9bf820c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two issues with this:
- It's unclear what problem this solves. Is there a bug report? A spec violation that can be demonstrated in a repro?
- We already have a module cache:
ModuleMap
. It feels wrong to create a second one. If the first one behaves incorrectly, it should be fixed or replaced. Having two seems likely to cause problems.
Node's module map should be a stricter version of the one specified in ecma262, specifically because we resolve the canonical path for a module, which means it isn't just stable within each file, but across the entire graph of files. |
Yes, see the test I've added
Having two seems necessary, there are simply two different things, the first one is behaving correctly AFAICT. |
Isn't the same thing true for |
The same thing is true yes, this PR ensures that both static and dynamic imports return the same thing. |
Migrating #46826 (comment) over here:
A simplified version of the above is this: import('data:text/javascript,export default 333').then(async first => {
const second = await import('data:text/javascript,export default 333')
console.log(first === second)
}) In Node 20 for me, this logs I looked at the test in this PR, and it involves filesystem mutations. Please correct me if I’m misreading this, but it seems like what you’re testing is an initial |
That's correct. |
Why isn't the |
Per the spec we’re not supposed to caching failures, only successes. So if you tried to import a nonexistent file, then created the file and tried again to import it, the second attempt should succeed. https://tc39.es/ecma262/#sec-HostLoadImportedModule (emphasis added):
But I also feel like this should be achievable with our current |
Yes that's the issue, |
Where is Perhaps it was doing so before #47824? |
Of course it's doing fs operations, it needs to in order to resolve symlinks and node_modules deps. Here's a few examples, node/lib/internal/modules/esm/resolve.js Lines 1035 to 1040 in ccada8b
Which itself is calling node/lib/internal/modules/esm/resolve.js Line 830 in ccada8b
node/lib/internal/modules/esm/resolve.js Line 837 in ccada8b
And here where they make the FS call: node/lib/internal/modules/esm/resolve.js Lines 211 to 212 in ccada8b
node/lib/internal/modules/esm/resolve.js Lines 749 to 750 in ccada8b
|
Within the error path of one of these failed filesystem calls, could we look in I’m also not convinced that this is a real problem. What is the use case where |
Beware. Read up on why the web does cache errors. Spec does not mandate it
across import sites but within a single importing source text it must keep
the error. Even still cross url errors are cached and multiple requests to
change that behavior haven't changed their behavior. Remember, once it
isn't an error the behavior will be cemented and very hard to change so
minor edge cases need consideration
…On Fri, Jun 2, 2023, 3:51 PM Geoffrey Booth ***@***.***> wrote:
Within the error path of one of these failed filesystem calls, could we
look in ModuleMap to see if the requested module was cached? That way we
would avoid both creating a new cache and slowing down resolution in the
default case where the filesystem hasn’t changed under our feet.
I’m also not convinced that this is a real problem. What is the use case
where import calls need to continue working correctly after files were
deleted from disk? Even if that’s technically the correct behavior per
spec, it’s such an extreme edge case that it doesn’t feel like something we
should add a performance regression to achieve.
—
Reply to this email directly, view it on GitHub
<#46662 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABZJIYX3IJLSRSKF47FALDXJJG3XANCNFSM6AAAAAAU4F7M7E>
.
You are receiving this because you are on a team that was mentioned.Message
ID: ***@***.***>
|
I think what we’re discussing is the opposite though, caching the success so that subsequent imports continue to succeed even if the referenced files have been deleted. |
I think that's off topic: if you don't like the spec, you should take it to TC39, but until then it seems pointless to discuss this in this PR.
I think it's still a problem if the resolution ends up on a different symlink (so we never end up on the error path yet we are breaking the spec).
Caching the errors would simplify the implementation, it's certainly something we can think about. |
I haven't read the code of this PR yet but I'm generally in favor of the behavior the spec requires here. We spent a great deal of time coming to the exact semantics we wanted to preserve and anyone who feels differently may want to review our prior discussions on this (you can check the dynamic import proposal repo and tc39 notes from that time) before pushing too hard. |
cc1b1b8
to
6fbabe9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aduh95 and I discussed this. We came to the conclusion that as far as we were aware, our ESM loader implementation was caching the results of the entire module resolution and loading process but not the individual results of the resolve and load steps. This is a problem for the spec compliance issue that Antoine pointed out, where a resolution success needs to be cached even if subsequent attempts at resolving the same input fail; without a resolution cache, we have no way of skipping repeated attempts at resolving the same input and therefore adhering to the spec.
I suggested therefore creating two caches: a “resolve cache” and a “load cache,” to correspond with the two phases/hooks of module resolution and loading. This should both fix the spec compliance issue as well as improve performance whenever an application imports the same thing twice (which is probably rare, as all of the input would need to be identical including parent URL, and it’s uncommon to have two import
statements in the same file for the same thing). This PR creates the new “resolve cache” and renames the former “module map” to be the “load cache.”
One thing that bothered me about the former module map, and is still problematic in this version, is that the values of the module map/load cache are sometimes promises, sometimes “module jobs,” etc rather than a one-to-one mapping of the output of the load
hook. And in this PR, the value of the resolve cache is the same promise or module job as the value of the module map/load cache. I would think that the value of the resolve cache should be whatever gets returned by resolve
: the absolute URL of the module and its assertions/attributes. Then that value would be the key for the load cache.
I see the sense in making the value of the resolve cache the same as the value of the load cache, to skip the second lookup and therefore save a tiny bit of performance. It just breaks the mental model of the purposes of these two caches, if the resolve cache isn’t really caching the results of the resolve
step. And the load cache isn’t really caching the result of the load
step either, as a module job is a much larger object than simply the module source that gets returned from load
.
Is what we really want here a single big cache where module jobs (or the promises that will resolve into module jobs) can be looked up by two keys, either the input for resolve
or the input for load
? Is that the most performant data structure for what we’re trying to achieve here?
For me what makes the most sense mentally in terms of my ease in following the logic would be something like this:
- A “resolve cache” where the keys are the serialized inputs to
resolve
and the values are the successful results of thoseresolve
calls (and failures aren’t cached) - A “load cache” where the keys are the serialized inputs to
load
and the values are the successful results of thoseload
calls (and failures aren’t cached)
And do we even need a data structure to hold the pending promises of the “module jobs,” the process of loading the module sources from disk or network? I suppose we probably do, to avoid parallel load
calls trying to read the same file from disk unnecessarily, which would mean yet a third cache where the keys are the same as those of the load cache and the values are the promises, or a boolean to tell you that it’s still pending, or something.
Just writing this all out it makes me feel like creating all these extra data structures would be bad for performance, so I’m not trying to say that I’m advocating for this architecture; just that it feels like what would make the most sense in terms of me being able to follow the logic as a reader. Is there some version of refactoring the module map that we have, with creating the resolution cache, that comes a bit closer to this mental model?
const promises = this.module.link(async (specifier, attributes) => { | ||
const job = this.loader.getModuleJob(specifier, url, attributes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should convert assertions to attributes in its own PR, and do all the references at once, rather than piecemeal as part of changing the modified lines of this PR. The conversion PR can also address whatever tests need to change as a result of the new proposal.
@@ -8,22 +8,22 @@ import { describe, it } from 'node:test'; | |||
describe('ESM: ensure initialization happens only once', { concurrency: true }, () => { | |||
it(async () => { | |||
const { code, stderr, stdout } = await spawnPromisified(execPath, [ | |||
'--experimental-import-meta-resolve', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being added? I thought we removed this flag, or were going to do so immediately after landing off-thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was forgotten when the test was added, runmain.js
use import.meta.resolve
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so this is just an unrelated change? Should this be in its own PR?
*/ | ||
assert.strictEqual(resolveHookRunCount, 2); | ||
assert.strictEqual(stdout.match(/resolve passthru/g)?.length, 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a problem? Resolutions are happening twice as many times now as before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comments above, it's related to import.meta.resolve
being called twice in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you didn’t change this test, just the assertions for it to pass. Doesn’t that mean that the changes in this PR break the previous test? Why should this test need to change based on cache refactoring?
Or was the test just buggy before because it was missing the required --experimental-import-meta-resolve
flag, and now that it has the flag it should’ve had all along, the assertions need to change accordingly? If so, this feels like an unrelated fix that should be its own PR.
Landed in 053511f |
PR-URL: nodejs#46662 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: nodejs#46662 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: nodejs#46662 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: nodejs#46662 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: nodejs#46662 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: nodejs#46662 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: nodejs#46662 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: #46662 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: #46662 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: nodejs#46662 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: nodejs/node#46662 Backport-PR-URL: nodejs/node#50669 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
PR-URL: nodejs/node#46662 Backport-PR-URL: nodejs/node#50669 Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Geoffrey Booth <webadmin@geoffreybooth.com>
According to https://tc39.es/ecma262/#sec-HostLoadImportedModule, if
import()
is called multiple times with the same (referrer
,specifier
) pair, then it must fulfil return the same object (or reject).https://tc39.es/proposal-import-attributes/#sec-HostLoadImportedModule takes it further by adding
assertionsattributes to the mix.This PR adds a cache layer for dynamic imports to ensure Node.js meets the spec.
EDIT: I should clarify here that the cache layer is also populated by static imports