Skip to content

module: ESM loader approach #36954

Closed
Closed
@JakobJingleheimer

Description

@JakobJingleheimer

There are 2 leading approach proposals for ESM loaders and chaining them.

Similarities

Both approaches:

  • consolidate hooks into
    • resolve(): finding the source (equivalent to the current experimental resolve()); returns {Object}
    • load(): supplying the source (a combination of the current experimental getFormat(), getSource(), and transformSource()); returns {Object}
  • Hooks of the same kind (resolve and load) are chained:
    1. all resolve()s are executed (resolve1, resolve2, …resolveN)
    2. all load()s are executed (load1, load2, …loadN)

Differences

Next()

This approach is originally detailed in #36396.

Hooks are called in reverse order (last first): a hook's 3rd argument would be a next() function, which is a reference to the previous loader hook. Ex there are 3 loaders: unpkg, http-to-https, and cache-buster (cache-buster is the final loader in the chain):

cache-buster invokes http-to-https, which in turn invokes unpkg (which itself invokes Node's default):
cache-busterhttp-to-httpsunpkg ← Node's default

The user must actively connect the chain or it (likely) fails: If a hook does not call next, "the loader short-circuits the chain and no further loaders are called".

Done()

This approach was also proposed in #36396 (in this comment).

The guiding principle of this approach is principal of least knowledge.

Hooks are called in the order they're declared/listed, and the return of the previous is fed as the input of the subsequent/next hook, and each hook is called automatically (unless short-circuited):

unpkghttp-to-httpscache-buster (if none of the supplied loaders output valid values, node's default loader/hook is invoked, enabling a hook to potentially handle only part and avoid re-implementing the native functionality node already provides via the default hook).

Hooks have a done argument, used in rare circumstances to short-circuit the chain.

Additionally, this proposal includes a polymorphic return:

hook returns continue? result scenario
done(validValue) no use validValue as final value (skipping any remaining loaders) module mocking (automated tests)
false no skip file (use empty string for value?) file is not needed in current circumstances
invalid value no throw and abort instead (current behaviour) user error
nullish yes loader did nothing: continue to next loader isn't for the file type
valid value yes pass value to next loader expected use

Examples

--loader https-loader \
--loader mock-loader \
--loader coffee-loader \
--loader other-loader

Resulting in https-loader being invoked first, mock-loader second, etc, and node's internal defaultLoader last.

For illustrative purposes, I've separated resolve and load hooks into different code blocks, but they would actually appear in the same module IRL.

Resolve hook chain

HTTPS Loader
const httpProtocols = new Set([
  'http:',
  'https:',
]);

/**
 * @param {Object} interimResult The result from the previous loader
 * (if any previous loader returned anything).
 * @param {string} [interimResult.format='']
 * @param {string} [interimResult.url='']
 * @param {string} context.originalSpecifier The original value of the import
 * specifier
 * @param {string?} context.parentUrl
 * @param {function} defaultResolver The built-in Node.js resolver (handles
 * built-in modules like `fs`, npm packages, etc)
 * @param {function(finalResult?)} done A short-circuit function to break the
 * resolve hook chain
 * @returns {false|{format?: string, url: string}?} If participating, the hook
 * resolves with a `url` and optionally a `format`
 */
export async function resolve(
  interimResult,
  // context,
  // defaultResolver,
  // done,
) {
  let url;

  try {
    url = new URL(interimResult.url);

    // there is a protocol and it's not one this loader supports: step aside.
    if (!httpProtocols.has(url.protocol)) return;
  }
  catch (err) {
    // specifier does not meet conditions for this loader; step aside.
    if (!determineWhetherShouldHandle(interimResult)) return;
  }

  return {
    url: '…',
  };
}
Mock Loader
export async function resolve(
  interimResult,
  context,
  defaultResolver,
  // done,
) {
  let url;

  try { url = new URL(interimResult.url) }
  catch (err) { url = new URL(defaultResolver(interimResult /* , … */).url) }

  url.searchParams.set('__quibble', generation);

  return {
    url: url.toString(),
  };
}

Load hook chain

HTTPS Loader
const contentTypeToFormat = new Map([
  ['text/coffeescript',		'coffeescript'],
  ['application/node',		'commonjs'],
  ['application/vnd.node.node',	'commonjs'],
  ['application/javascript',	'javascript'],
  ['text/javascript',		'javascript'],
  ['application/json',		'json'],
  // …
]);

/**
 * @param {Object} interimResult The result from the previous loader (if any
 * previous loader returned anything)
 * @param {string} [interimResult.format=''] Potentially a transient value. If
 * the resolve chain settled with a `format`, that is the initial value here.
 * @param {string|ArrayBufferView|TypedArray} [interimResult.source='']
 * @param {Object} context
 * @param {Array} context.conditions
 * @param {string?} context.parentUrl
 * @param {string} context.resolvedUrl The module's resolved url (as
 * determined by the resolve hook chain).
 * @param {Function} defaultLoader The built-in Node.js loader (handles file
 * and data URLs).
 * @param {Function} done A terminating function to break the load hook chain;
 * done accepts a single argument, which is used for the final result of the
 * load hook chain.
 */
export async function load(
  interimResult,
  { resolvedUrl },
  // defaultLoader,
  // done,
) {
  if (interimResult.source) return; // step aside (content already retrieved)

  const url = new URL(resolvedUrl);

  if (!httpProtocols.has(url.protocol)) return; // step aside

  const result = await new Promise((res, rej) => {
    get(resolvedUrl, (rsp) => {
      const format = contentTypeToFormat.get(rsp.headers['content-type']);
      let source = '';

      rsp.on('data', (chunk) => source += chunk);
      rsp.on('end', () => res({
        format,
        source,
      }));
      rsp.on('error', (err) => rej(err));
    });
  });

  return result;
}
Mock Loader
export async function load(
  interimResult,
  { resolvedUrl },
  defaultLoader,
  // done,
) {
  const isQuibbly = (new URL(resolvedUrl)).searchParams.get('__quibble');

  if (!isQuibbly) return;

  const mock = defaultLoader(urlToMock); // or some runtime-supplied mock

  return { source: mock };
}
CoffeeScript Loader
const exts = new Set([
  '.coffee',
  '.coffee.md',
  '.litcoffee',
]);

export async function load(
  interimResult,
  context,
  defaultLoader,
  // done,
) {
  if (
    !!interimResult.format
    && interimResult.format !== 'coffeescript'
  ) return; // step aside

  const ext = extname(context.resolvedUrl);

  if (!exts.has(ext)) return; // step aside

  const rawSource = interimResult.source || defaultLoader(
    {
      format: 'coffeescript', // defaultLoader currently doesn't actually care
    },
    context
  ).source;
  const transformedSource = coffee.compile(rawSource.toString(), {
    whateverOptionSpecifies: 'module'
  });

  return {
    format: 'module',
    source: transformedSource,
  };
}
Updates to ESMLoader.load()
class ESMLoader {
  async load(resolvedUrl, moduleContext, resolvedFormat = '') {
    const context = {
      ...moduleContext,
      resolvedUrl,
    }
    let shortCircuited = false; // should we support calling done with no arg?
    let finalResult;
    let format = resolvedFormat;
    let source = '';

    function done(result) {
      finalResult = result;
      shortCircuited = true;
    }

    for (let i = 0, count = this.loaders.length; i < count; i++) {
      const tmpResult = await loader(
        { format, source },
        context,
        defaultLoader,
        done,
      );

      if (shortCircuited) break;

      if (tmpResult == null) continue; // loader opted out

      if (tmpResult === false) {
        finalResult = { source: '' };
        break;
      }

      if (tmpResult?.format != null) format = tmpResult.format;
      if (tmpResult?.source != null) source = tmpResult.source;
    }

    finalResult ??= interimResult;

    // various existing result checks and error throwing
  }
}

Concerns Raised

Next()

  1. This creates an Inception-like pattern, which could confuse users: loaders would be specified in a different sequence than called, as loaders are called in a nested manner: the final loader calls the previous, and the previous calls its previous, etc all the way back to the beginning.
  2. The next function does not behave as many current, well-known implementations behave (ex javascript's native generator's next is the inverse order to this's, and not calling ExpressJS's route-handler's next does not break the chain).
  3. Requires the user to have specific knowledge: next is effectively required (not calling next will likely lead to adverse/undesirable behaviour, and in many cases, break in very confusing ways).
  4. Unit testing is more difficult (requiring spying in almost all cases, whereas done's needs spying very rarely, by, likely, more advanced users)

Done()

  1. This could potentially cause issue for APMs (does the next approach also?) After chatting with @bengl, it seems like this is not an issue as V8 exposes what they need.
  2. A hook that unintentionally does not return / returns nullish might be difficult to track down I believe this was resolved in the previous issue discussion?

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussIssues opened for discussions and feedbacks.esmIssues and PRs related to the ECMAScript Modules implementation.loadersIssues and PRs related to ES module loadersmoduleIssues and PRs related to the module subsystem.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions