Skip to content

module: ESM loaders next steps #36396

Closed
Closed
@GeoffreyBooth

Description

@GeoffreyBooth

This issue is meant to be a tracking issue for where we as a team think we want ES module loaders to go. I’ll start it off by writing what I think the next steps are, and based on feedback in comments I’ll revise this top post accordingly.

I think the first priority is to finish the WIP PR that @jkrems started to slim down the main four loader hooks (resolve, getFormat, getSource, transformSource) into two (resolveToURL and loadFromURL, or should they be called resolve and load?). This would solve the issue discussed in #34144 / #34753.

Next I’d like to add support for chained loaders. There was already a PR opened to achieve this, but as far as I can tell that PR doesn’t actually implement chaining as I understand it; it allows the transformSource hook to be chained but not the other hooks, if I understand it correctly, and therefore doesn’t really solve the user request.

A while back I had a conversation with @jkrems to hash out a design for what we thought a chained loaders API should look like. Starting from a base where we assume #35524 has been merged in and therefore the only hooks are resolve and load and getGlobalPreloadCode (which probably should be renamed to just globalPreloadCode, as there are no longer any other hooks named get*), we were thinking of changing the last argument of each hook from default<hookName> to next, where next is the next registered function for that hook. Then we hashed out some examples for how each of the two primary hooks, resolve and load, would chain.

Chaining resolve hooks

So for example say you had a chain of three loaders, unpkg, http-to-https, cache-buster:

  1. The unpkg loader resolves a specifier foo to an url http://unpkg.com/foo.
  2. The http-to-https loader rewrites that url to https://unpkg.com/foo.
  3. The cache-buster that takes the url and adds a timestamp to the end, so like https://unpkg.com/foo?ts=1234567890.

These could be implemented as follows:

unpkg loader

export async function resolve(specifier, context, next) { // next is Node’s resolve
  if (isBareSpecifier(specifier)) {
    return `http://unpkg.com/${specifier}`;
  }
  return next(specifier, context);
}

http-to-https loader

export async function resolve(specifier, context, next) { // next is the unpkg loader’s resolve
  const result = await next(specifier, context);
  if (result.url.startsWith('http://')) {
    result.url = `https${result.url.slice('http'.length)}`;
  }
  return result;
}

cache-buster loader

export async function resolve(specifier, context, next) { // next is the http-to-https loader’s resolve
  const result = await next(specifier, context);
  if (supportsQueryString(result.url)) { // exclude data: & friends
    // TODO: do this properly in case the URL already has a query string
    result.url += `?ts=${Date.now()}`;
  }
  return result;
}

These chain “backwards” in the same way that function calls do, along the lines of cacheBusterResolve(httpToHttpsResolve(unpkgResolve(nodeResolve(...)))) (though in this particular example, the position of cache-buster and http-to-https can be swapped without affecting the result). The point though is that the hook functions nest: each one always just returns a string, like Node’s resolve, and the chaining happens as a result of calling next; and if a hook doesn’t call next, the chain short-circuits. I’m not sure if it’s preferable for the API to be node --loader unpkg --loader http-to-https --loader cache-buster or the reverse, but it would be easy to flip that if we get feedback that one way is more intuitive than the other.

Chaining load hooks

Chaining load hooks would be similar to resolve hooks, though slightly more complicated in that instead of returning a single string, each load hook returns an object { format, source } where source is the loaded module’s source code/contents and format is the name of one of Node’s ESM loader’s “translators”: commonjs, module, builtin (a Node internal module like fs), json (with --experimental-json-modules) or wasm (with --experimental-wasm-modules).

Currently, Node’s internal ESM loader throws an error on unknown file types: import('file.javascript') throws, even if the contents of that file are perfectly acceptable JavaScript. This error happens during Node’s internal resolve when it encounters a file extension it doesn’t recognize; hence the current CoffeeScript loader example has lots of code to tell Node to allow CoffeeScript file extensions. We should move this validation check to be after the format is determined, which is one of the return values of load; so basically, it’s on load to return a format that Node recognizes. Node’s internal load doesn’t know to resolve a URL ending in .coffee to module, so Node would continue to error like it does now; but the CoffeeScript loader under this new design no longer needs to hook into resolve at all, since it can determine the format of CoffeeScript files within load. In code:

coffeescript loader

import CoffeeScript from 'coffeescript';

// CoffeeScript files end in .coffee, .litcoffee or .coffee.md
const extensionsRegex = /\.coffee$|\.litcoffee$|\.coffee\.md$/;

export async function load(url, context, next) {
  const result = await next(url, context);

  // The first check is technically not needed but ensures that
  // we don’t try to compile things that already _are_ compiled.
  if (result.format === undefined && extensionsRegex.test(url)) {
    // For simplicity, all CoffeeScript URLs are ES modules.
    const format = 'module';
    const source = CoffeeScript.compile(result.source, { bare: true });
    return {format, source};
  }
  return result;
}

And the other example loader in the docs, to allow import of https:// URLs, would similarly only need a load hook:

https loader

import { get } from 'https';

export async function load(url, context, next) {
  if (url.startsWith('https://')) {
    let format; // default: format is undefined
    const source = await new Promise((resolve, reject) => {
      get(url, (res) => {
        // Determine the format from the MIME type of the response
        switch (res.headers['content-type']) {
          case 'application/javascript':
          case 'text/javascript': // etc.
            format = 'module';
            break;
          case 'application/node':
          case 'application/vnd.node.node':
            format = 'commonjs';
            break;
          case 'application/json':
            format = 'json';
            break;
          // etc.
        }

        let data = '';
        res.on('data', (chunk) => data += chunk);
        res.on('end', () => resolve({ source: data }));
      }).on('error', (err) => reject(err));
    });
    return {format, source};
  }

  return next(url, context);
}

If these two loaders are used together, where the coffeescript loader’s next is the https loader’s hook and https loader’s next is Node’s native hook, so like coffeeScriptLoad(httpsLoad(nodeLoad(...))), then for a URL like https://example.com/module.coffee:

  1. The https loader would load the source over the network, but return format: undefined, assuming the server supplied a correct Content-Type header like application/vnd.coffeescript which our https loader doesn’t recognize.
  2. The coffeescript loader would get that { source, format: undefined } early on from its call to next, and set format: 'module' based on the .coffee at the end of the URL. It would also transpile the source into JavaScript. It then returns { format: 'module', source } where source is runnable JavaScript rather than the original CoffeeScript.

Chaining globalPreloadCode hooks

For now, I think that this wouldn’t be chained the way resolve and load would be. This hook would just be called sequentially for each registered loader, in the same order as the loaders themselves are registered. If this is insufficient, for example for instrumentation use cases, we can discuss and potentially change this to follow the chaining style of load.

Next Steps

Based on the above, here are the next few PRs as I see them:

  1. Finish esm: merge and simplify loader hooks #35524, simplifying the hooks to resolve, load and globalPreloadCode.
  2. Refactor Node’s internal ESM loader’s hooks into resolve and load. Node’s internal loader already has no-ops for transformSource and getGlobalPreloadCode, so all this really entails is merging the internal getFormat and getSource into one function load.
  3. Refactor Node’s internal ESM loader to move its exception on unknown file types from within resolve (on detection of unknown extensions) to within load (if the resolved extension has no defined translator).
  4. Implement chaining as described here, where the default<hookName> becomes next and references the next registered hook in the chain.
  5. Get a load return value of format: 'commonjs' to work, or at least error informatively. See esm: Modify ESM Experimental Loader Hooks #34753 (comment).
  6. Investigate and potentially add an additional transform hook (see below).

This work should complete many of the major outstanding ES module feature requests, such as supporting transpilers, mocks and instrumentation. If there are other significant user stories that still wouldn’t be possible with the loaders design as described here, please let me know. cc @nodejs/modules

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussIssues opened for discussions and feedbacks.esmIssues and PRs related to the ECMAScript Modules implementation.loadersIssues and PRs related to ES module loaders

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions