Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: preImport #89

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions doc/design/proposal-pre-import.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# preImport Proposal

## Problem Statement

With the core resolver now synchronous only, any asynchronous resolution work
is no longer possible as the resolve function must return an immediate
resolution.

For loaders that asynchronously obtain resolution information, it would be
useful to have a new hook to enable these use cases.

## preImport Hook

The hook has the signature:

```ts
export async function preImport(specifier: string, context: {
conditions: string[],
topLevel: boolean,
parentURL: string | undefined
});
```

The `preImport` hook allows for tracking and asynchronous setup work for every
top-level import operation. It has no return value, although it can return
Comment on lines +24 to +25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This “top-level” qualifier is worth calling out. What’s the precise definition? Any static or dynamic import in the application entry point file? Only dynamic imports in that entry point?

Is that whatever data gets created by preImport then available for all resolve operations, not just top-level specifiers?

Since this is something of an initialization-of-the-module-graph thing, not run for every import, would it make sense for this hook to just run once and be passed in an array of specifiers? Like preImport(specifiers: string[], context: object) where specifiers would be the list of all top-level import specifiers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition is any import which calls ECMA function 16.2.1.5.2 Evaluate.

It is called for all dynamic imports and the main entry point, and any module worker thread instantiation, and will be called with the exact specifier passed to those.

It's singular and not a list because the timing is unique to each specifier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ECMA function 16.2.1.5.2 Evaluate is incomprehensible to me.

Say an application consists of these two files. Which imports get preImport called?

// entry.js
import 'a';
await import('b');
import './c.js';

// c.js
import 'd';
await import('e');

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example, it would get called for entry.js, then 'b' when that import is hit, and then 'e' when that dynamic import is hit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so perhaps the documentation should say something along the lines of this:

The preImport hook is called for the root of every tree of modules imported by a dynamic import: the initial entry point itself (the file passed to node on the command line) and any dynamic import() calls within. The way ES modules work is that an entry point is read and statically analyzed, and all the static import statements are themselves analyzed and followed and their files read and analyzed, and so on until all the leaves of the tree have been parsed. Then all this code is evaluated; and during this evaluation phase the dynamic import() statements are followed, which uncover new trees of modules and sub-modules. The same analysis and evaluation process then begins for each of them. preImport happens after the analysis but before the evaluation of each set of modules.

I would think that it could be called with all the specifiers within the tree of modules for which it’s being called? So in this example, the first call could be for entry.js, a, c.js and the second call would be for b, d and the third call would be for e.

Copy link

@davidje13 davidje13 Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example, it would get called for entry.js, then 'b' when that import is hit, and then 'e' when that dynamic import is hit.

why wouldn't it get called for 'a', './c.js', and 'd'? Standard imports like that can still be async anyway, as I understand it? From @GeoffreyBooth 's comment, I understood that the only place where it would not be called is when a user invokes import.meta.resolve (which I think is a problem, but if this isn't even called when doing a regular import then I'm struggling to see the point)

a promise to to delay the module pipeline operations. No further hooks will
be called on that module graph until this hook resolves successfully if present.

The `preImport` hook is called for each top-level import operation by the module
loader, both for the host-called imports (ie for the main entry) and for dynamic
`import()` imports. These are distinguished by the `dynamic` context.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there was some conversation about the back-and-forth of dynamic vs topLevel. I don't have an opinion either way, but note that this is currently inconsistent with line 19.


All `preImport` hooks for all loaders are run asynchronously in parallel, and
block any further load operations (ie resolve and load) for the module graph
being imported until they all complete successfully.

Multiple import calls to the same import specifier will re-call the hook
multiple times. The first error thrown by the `preImport` hooks will be directly
returned to the specific import operation as the load failure.

## Example

<details>
<summary>Import Map Generation</summary>

Consider an import map loader which obtains the import map for a module
asynchronously:

```ts
import { Generator } from '@jspm/generator';

// stateful host import map for current session
let importMap = { imports: {}, scopes: {} };

function isUrl (specifier) {
try {
new URL(specifier);
return true;
}
catch {
return false;
}
}

const isBareSpecifier = id => !id.startsWith('./') && !id.startsWith('../') &&
!id.startsWith('/') && !isUrl(id);

export async function preImport(specifier, { conditions, parentURL }) {
if (!isBareSpecifier(specifier)) return;
const generator = new Generator({
// passing the original map extends it
inputMap: importMap,
baseUrl: parentURL,
defaultProvider: 'nodemodules',
env: conditions,
});
await generator.install(specifier);
// the new map will then have the new mappings and the old mappings
importMap = generator.getMap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a race condition; old importMap is provided to potentially multiple Generators (as called out in the description: all run in parallel), then importMap is overridden with the results of each as they finish, so some results will get clobbered.

I'm not so concerned about this specific example, but more that this demonstrates how easy it is to make this sort of mistake with the API as proposed (parallel execution combined with forcing the output to be via global state is likely to cause many issues like this)

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function doesn’t return anything, how are its results preserved for use in resolve later? As part of the internal state of generator or something it references?

If that’s the case, it might be good for the example to illustrate this. Maybe a variable could be declared at the top level of the loader that gets referenced by both hooks, and the contents of this variable are how the information determined by preImport gets shared with resolve.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The importMap mutable variable is the shared state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I missed this. I don’t know if it’s just me or if it’s likely to be nonobvious to others; if the latter, maybe add a comment inside resolve to the effect of // Reference same `importMap` variable modified within `preImport` . That would make the expected usage clear, so people know how this hook is expected to work.


export function resolve(specifier, { parentURL }) {
return {
url: importMapResolve(importMap, specifier, parentURL),
shortCircuit: true
};
}
```

Internally, JSPM Generator performs fetch and module dependency analysis over
the graph using `fetch` and `es-module-lexer`.

For FS operations the load hook should be able to share the natural OS cache
anyway. For network operations if the `fetch` function is shared it should be
possible to maintain a fetch cache, alternatively the `load` hook could be
added to extend from a shared cache in these operations.
</details>