Description
This issue is meant to be a tracking issue for where we as a team think we want ES module loaders to go. I’ll start it off by writing what I think the next steps are, and based on feedback in comments I’ll revise this top post accordingly.
I think the first priority is to finish the WIP PR that @jkrems started to slim down the main four loader hooks (resolve
, getFormat
, getSource
, transformSource
) into two (resolveToURL
and loadFromURL
, or should they be called resolve
and load
?). This would solve the issue discussed in #34144 / #34753.
Next I’d like to add support for chained loaders. There was already a PR opened to achieve this, but as far as I can tell that PR doesn’t actually implement chaining as I understand it; it allows the transformSource
hook to be chained but not the other hooks, if I understand it correctly, and therefore doesn’t really solve the user request.
A while back I had a conversation with @jkrems to hash out a design for what we thought a chained loaders API should look like. Starting from a base where we assume #35524 has been merged in and therefore the only hooks are resolve
and load
and getGlobalPreloadCode
(which probably should be renamed to just globalPreloadCode
, as there are no longer any other hooks named get*
), we were thinking of changing the last argument of each hook from default<hookName>
to next
, where next
is the next registered function for that hook. Then we hashed out some examples for how each of the two primary hooks, resolve
and load
, would chain.
Chaining resolve
hooks
So for example say you had a chain of three loaders, unpkg
, http-to-https
, cache-buster
:
- The
unpkg
loader resolves a specifierfoo
to an urlhttp://unpkg.com/foo
. - The
http-to-https
loader rewrites that url tohttps://unpkg.com/foo
. - The
cache-buster
that takes the url and adds a timestamp to the end, so likehttps://unpkg.com/foo?ts=1234567890
.
These could be implemented as follows:
unpkg
loader
export async function resolve(specifier, context, next) { // next is Node’s resolve
if (isBareSpecifier(specifier)) {
return `http://unpkg.com/${specifier}`;
}
return next(specifier, context);
}
http-to-https
loader
export async function resolve(specifier, context, next) { // next is the unpkg loader’s resolve
const result = await next(specifier, context);
if (result.url.startsWith('http://')) {
result.url = `https${result.url.slice('http'.length)}`;
}
return result;
}
cache-buster
loader
export async function resolve(specifier, context, next) { // next is the http-to-https loader’s resolve
const result = await next(specifier, context);
if (supportsQueryString(result.url)) { // exclude data: & friends
// TODO: do this properly in case the URL already has a query string
result.url += `?ts=${Date.now()}`;
}
return result;
}
These chain “backwards” in the same way that function calls do, along the lines of cacheBusterResolve(httpToHttpsResolve(unpkgResolve(nodeResolve(...))))
(though in this particular example, the position of cache-buster
and http-to-https
can be swapped without affecting the result). The point though is that the hook functions nest: each one always just returns a string, like Node’s resolve
, and the chaining happens as a result of calling next
; and if a hook doesn’t call next
, the chain short-circuits. I’m not sure if it’s preferable for the API to be node --loader unpkg --loader http-to-https --loader cache-buster
or the reverse, but it would be easy to flip that if we get feedback that one way is more intuitive than the other.
Chaining load
hooks
Chaining load
hooks would be similar to resolve
hooks, though slightly more complicated in that instead of returning a single string, each load
hook returns an object { format, source }
where source
is the loaded module’s source code/contents and format
is the name of one of Node’s ESM loader’s “translators”: commonjs
, module
, builtin
(a Node internal module like fs
), json
(with --experimental-json-modules
) or wasm
(with --experimental-wasm-modules
).
Currently, Node’s internal ESM loader throws an error on unknown file types: import('file.javascript')
throws, even if the contents of that file are perfectly acceptable JavaScript. This error happens during Node’s internal resolve
when it encounters a file extension it doesn’t recognize; hence the current CoffeeScript loader example has lots of code to tell Node to allow CoffeeScript file extensions. We should move this validation check to be after the format is determined, which is one of the return values of load
; so basically, it’s on load
to return a format
that Node recognizes. Node’s internal load
doesn’t know to resolve a URL ending in .coffee
to module
, so Node would continue to error like it does now; but the CoffeeScript loader under this new design no longer needs to hook into resolve
at all, since it can determine the format of CoffeeScript files within load
. In code:
coffeescript
loader
import CoffeeScript from 'coffeescript';
// CoffeeScript files end in .coffee, .litcoffee or .coffee.md
const extensionsRegex = /\.coffee$|\.litcoffee$|\.coffee\.md$/;
export async function load(url, context, next) {
const result = await next(url, context);
// The first check is technically not needed but ensures that
// we don’t try to compile things that already _are_ compiled.
if (result.format === undefined && extensionsRegex.test(url)) {
// For simplicity, all CoffeeScript URLs are ES modules.
const format = 'module';
const source = CoffeeScript.compile(result.source, { bare: true });
return {format, source};
}
return result;
}
And the other example loader in the docs, to allow import
of https://
URLs, would similarly only need a load
hook:
https
loader
import { get } from 'https';
export async function load(url, context, next) {
if (url.startsWith('https://')) {
let format; // default: format is undefined
const source = await new Promise((resolve, reject) => {
get(url, (res) => {
// Determine the format from the MIME type of the response
switch (res.headers['content-type']) {
case 'application/javascript':
case 'text/javascript': // etc.
format = 'module';
break;
case 'application/node':
case 'application/vnd.node.node':
format = 'commonjs';
break;
case 'application/json':
format = 'json';
break;
// etc.
}
let data = '';
res.on('data', (chunk) => data += chunk);
res.on('end', () => resolve({ source: data }));
}).on('error', (err) => reject(err));
});
return {format, source};
}
return next(url, context);
}
If these two loaders are used together, where the coffeescript
loader’s next
is the https
loader’s hook and https
loader’s next
is Node’s native hook, so like coffeeScriptLoad(httpsLoad(nodeLoad(...)))
, then for a URL like https://example.com/module.coffee
:
- The
https
loader would load the source over the network, but returnformat: undefined
, assuming the server supplied a correctContent-Type
header likeapplication/vnd.coffeescript
which ourhttps
loader doesn’t recognize. - The
coffeescript
loader would get that{ source, format: undefined }
early on from its call tonext
, and setformat: 'module'
based on the.coffee
at the end of the URL. It would also transpile the source into JavaScript. It then returns{ format: 'module', source }
wheresource
is runnable JavaScript rather than the original CoffeeScript.
Chaining globalPreloadCode
hooks
For now, I think that this wouldn’t be chained the way resolve
and load
would be. This hook would just be called sequentially for each registered loader, in the same order as the loaders themselves are registered. If this is insufficient, for example for instrumentation use cases, we can discuss and potentially change this to follow the chaining style of load
.
Next Steps
Based on the above, here are the next few PRs as I see them:
- Finish esm: merge and simplify loader hooks #35524, simplifying the hooks to
resolve
,load
andglobalPreloadCode
. - Refactor Node’s internal ESM loader’s hooks into
resolve
andload
. Node’s internal loader already has no-ops fortransformSource
andgetGlobalPreloadCode
, so all this really entails is merging the internalgetFormat
andgetSource
into one functionload
. - Refactor Node’s internal ESM loader to move its exception on unknown file types from within
resolve
(on detection of unknown extensions) to withinload
(if the resolved extension has no defined translator). - Implement chaining as described here, where the
default<hookName>
becomesnext
and references the next registered hook in the chain. - Get a
load
return value offormat: 'commonjs'
to work, or at least error informatively. See esm: Modify ESM Experimental Loader Hooks #34753 (comment). - Investigate and potentially add an additional
transform
hook (see below).
This work should complete many of the major outstanding ES module feature requests, such as supporting transpilers, mocks and instrumentation. If there are other significant user stories that still wouldn’t be possible with the loaders design as described here, please let me know. cc @nodejs/modules