Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: __cjsModule on ESM CJS wrapper and in require ESM #53848

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

guybedford
Copy link
Contributor

@guybedford guybedford commented Jul 14, 2024

As a follow-up to #52166 which explicitly adds support for the __esModule wrapper marker for ESM modules imported into CJS, this implements a complementary __cjsModule wrapper marker for the ESM wrappers created when importing CJS modules into ESM. In addition, when require(esm) is applied to any module exporting __cjsModule: true (including these wrappers), it will handle unwrapping the default supporting custom CJS values and transitive reexports of wrappers exposing their underlying CommonJS modules.

This came out of lengthy discussion with @joyeecheung both online and offline around that PR working through various ecosystem implications, and seems to me to be a good compromise to any potential interop problems #52166 may cause in future.

The use case here is that given an arbitrary namespace object imported into a JavaScript file, in dynamic linking workflows where no further information is available, it is useful to be able to infer many interop patterns against such arbitrary modules.

For example, consider an ESM module that is transpiled into a CommonJS module and then transpiled back to ESM running on the web, where a dependency could be in an unknown module format:

Faux ESM:

import dep from 'dep';
console.log(dep);

Transpiled from ESM to CJS (as tools do today, and then published to npm say):

const depMod = require('dep');
const depNs = depMod.__esModule ? depMod : { default: depMod };
console.log(depNs.default);

Now if I run this module in Node.js, dep may now get an __esModule marker added by Node.js if it doesn't have an __esModule marker already and has a default export.

So if I want to transpile this module back into ESM and run it on the web, with that same dynamic dependency linkage, I end up with something like:

import * as depMod from 'dep';

// New interop layer to mimic Node.js require ESM semantics (in reality you'd use a proxy for live bindings)
const depRequire = !('__esModule' in depMod) && ('default' in depMod) ? { ...depMod, __esModule: true } : depMod;

// Original published CJS, with require value replaced:
const depMod = depRequire;
const depNs = depMod.__esModule ? depMod : { default: depMod };
console.log(depNs.default);

So we now have a scheme to transpile arbitrary CJS to ESM for browsers against full dynamic externals by adding this new interop layer where an arbitrary require('dep') in a build to ESM gets turned into:

import * as depMod from 'dep';
const depRequire = !('__esModule' in depMod) && ('default' in depMod) ? { ...depMod, __esModule: true } : depMod;

That is, even if we don't know the value of dep, we have a way we can coerce it into the correct value and interop with it correctly, regardless of whether it came from CommonJS, ESM or faux ESM.

But there is now a new interop problem with such a scheme - in the case where depMod is a CJS module transpiled into ESM, we may want the CJS transpiled into ESM to be represented by the Node.js CJS ESM wrapper which looks like { default: module.exports, ...namedExportsFromCjsModuleLexer }.

But if we pass a module of the above shape in the above interop pattern as depMod, since it does not have an __esModule and it does have a default, it will give a value of depRequire being { default: module.exports, ...namedExportsFromCjsModuleLexer, __esModule: true }, when we would have expected depRequire to have been module.exports!

But, if there is a __cjsModule marker in the CJS ESM representation, then the interop can be updated to be:

import * as depMod from 'dep';
const depRequire = depMod.__cjsModule ? depMod.default : !('__esModule' in depMod) && ('default' in depMod) ? { ...depMod, __esModule: true } : depMod;

Therefore fully solving the dynamic externalization against all of the cases where dep might respectively be CJS transpiled to ESM and real ESM with the conditional faux ESM interop layer for the consuming faux ESM as CJS transpiled into ESM per #52166.

I understand at this point that this all may seem contrived, but in real-world interop patterns, there is no concept of reaching some compositional complexity limit - users will apply tools to code, and expect it to work. Fractal interop is the name of the game. And where the past interop concerns were about ensuring ESM semantics transpiled to CJS, our future interop concerns are now the semantics of CJS transpiled to ESM. __cjsModule helps to bridge that much like __esModule did when I originally implemented it in Traceur and Babel later adopted.

By landing and porting this alongside #52166, it should give another tool to interop authors wherever they are in the toolchain to be able to have a way to deal with these kinds of onion interops. And while the exact interop issue is not something that Node.js itself has to deal with, I strongly believe that being friendly to tooling dealing with hard interop cases would be a net win for the ecosystem. Well that, and also I very much need this feature in my own tooling workflows.

Note that this would be a major change, and should go out at the same time as the unflagging for require(esm).

//cc @nodejs/loaders

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run. labels Jul 14, 2024
doc/api/esm.md Show resolved Hide resolved
@RedYetiDev RedYetiDev added the loaders Issues and PRs related to ES module loaders label Jul 15, 2024
@guybedford guybedford changed the title feat: __cjsModule on ESM CJS wrapper module: __cjsModule on ESM CJS wrapper and in require ESM Jul 15, 2024
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

doc/api/esm.md Outdated Show resolved Hide resolved
doc/api/esm.md Outdated Show resolved Hide resolved
@legendecas
Copy link
Member

I may not follow with the motivation. If this feature is going to be used in "transpile a CJS module (which was transpiled from ESM) back into ESM and run it on the web", can this be added to transpilers instead?

@guybedford
Copy link
Contributor Author

guybedford commented Jul 16, 2024

I may not follow with the motivation. If this feature is going to be used in "transpile a CJS module (which was transpiled from ESM) back into ESM and run it on the web", can this be added to transpilers instead?

It is also helpful to support CJS transpiled into ESM (ie modern bundled code) running in Node.js against an external that is unknown as to whether it is CJS or ESM, where being able to know if the external is a CJS wrapper can allow lowering the required require(esm) interop pattern.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 22, 2024

The reason why we introduced the __esModule was:

  1. It has been a well-established technique used by most transpilers, and is known to work
  2. There are already many popular packages on npm transpiled from ESM to CJS and therefore already needs the __esModule construct if they start to ship ESM, and many popular code base using existing transpilers that rely on __esModule.

I think to implement a counterpart, we at least need 1 - there should at least be some integration in CJS -> ESM transpilers adopting this pattern to run things in browsers, which is what this primarily is useful for - it's very rare to see CJS code transpiled into ESM to be run in Node.js, as far as I know. Suppose transpilers disagree and some adopt a different flavor, or if this trick doesn't turn out to be actually working because no integration test has been written yet (this is all on paper for now), we'd be left with something that doesn't work and nobody uses. #52166 included an integration test using output from TypeScript, I think we need something similar in this PR too.

@guybedford
Copy link
Contributor Author

guybedford commented Jul 22, 2024

@joyeecheung I think the requirement that this new pattern already be in use before we have created the pattern is an unrealistic requirement to make.

I have given two clear use cases for this as a feature, with the direct use case in #53848 (comment) immediately being solved by landing this PR.

it's very rare to see CJS code transpiled into ESM to be run in Node.js, as far as I know

RollupJS is both a Node.js and browser bundler, so this is not quite true. When using RollupJS in Node.js applications to, say, speed up startup times, this is very much a real use case.

Suppose transpilers disagree and some adopt a different flavor, or if this trick doesn't turn out to be actually working because no integration test has been written yet (this is all on paper for now), we'd be left with something that doesn't work and nobody uses.

There's no trick or flavour here, simply the question - is the ES module namespace that I have imported an ESM wrapper around a CJS module? And I think that is a useful question to answer by supporting this flag to answer it. What more would you like to see tested with regards to this?

@guybedford
Copy link
Contributor Author

guybedford commented Jul 22, 2024

Specifically the problem is needing to know when in Node.js if an arbitrary ESM namespace is an ESM CJS wrapper or not. With that answer, yes, new CJS to ESM interop patterns can emerge as I have discussed, since they very much need this information. But the check is useful in its own right, and does not depend on their future emergence either.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 22, 2024

I think the requirement that this new pattern already be in use before we have created the pattern is an unrealistic requirement to make.

Isn't this a pattern primarily meant for code run in the browser? Then it should be possible to create a bundler integration that demonstrate how this can be used in the browser first? __esModule is also a pattern that existed before we add it to Node.js, wasn't that in the same position?

When using RollupJS in Node.js applications to, say, speed up startup times, this is very much a real use case.

RollupJS supports CommonJS output too. As far as I know typically people bundle code into CommonJS format to speed up startup times (ESM loading is slower than CommonJS loading in the first place), or to use the bundle with SEA or snapshots since they do not support ESM. But choosing the ESM output for a Node.js environment is rare. If it's common enough it seems strange that this need for __cjsModule only appears after we decided to add __esModule, instead of emerging on its own long before, or that transpilers didn't implement it themselves already, like what happened with __esModule.

What more would you like to see tested with regards to this?

I was referring to this use case:

It is also helpful to support CJS transpiled into ESM (ie modern bundled code) running in Node.js against an external that is unknown as to whether it is CJS or ESM, where being able to know if the external is a CJS wrapper can allow lowering the required require(esm) interop pattern.

In #52166 we added a test fixture with the output of the actual TypeScript compiler. Before the PR, loading default exports of real ESM in transpiled ESM -> CJS code leads to an error. After it works intuitively. I think for this PR we'd need CJS -> ESM code emitted by an existing transpiler that doesn't load an external CJS without __cjsModule, but loads with the PR, and we commit the transpiler output as a test fixture.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 22, 2024

But the check is useful in its own right, and does not depend on their future emergence either.

I agree that it's useful in its own right, though currently for code only targeting Node.js this can also be worked around by checking whether the path exists in module.Module._cache, or try-catch require(cjs) to see if it loads and the default is equal to the required result. The reason why we decided to add __esModule directly in core was that asking all transpilers to use the alternative approach (checking required[Symbol.toStringTag] or util.isModuleNamespace(required) and then asking all their users to update their transpilers may be a great hassle. But since __cjsModule is not yet a convention picked up by any transpilers, it doesn't seem we are in a rush? It would be better to see transpilers actually pick this up and prove that it works before we add yet another underscore underscore property. Having one was unfortunate enough but was something we did for the sake of existing code that already relies on __esModule. Having another even though no code in the real world rely on __cjsModule convention and we only get some vague promise that it would be used seems even more unfortunate.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 22, 2024

For the specific case in OP, it seems the problem all stems from the proposed interop layer, which AFAIK no transpiler has implemented yet. The layer seems redundant for ESM -> CJS code, however. If you don't do that layer, and simply transpile this:

const depMod = require('dep');
const depNs = depMod.__esModule ? depMod : { default: depMod };
console.log(depNs.default);

to just this (IIUC, this is already what Rollup currently produce with interop: 'auto')

import depMod from 'dep';
const depNs = depMod.__esModule ? depMod : { default: depMod };
console.log(depNs.default);

Things already just work automatically. If dep is faux ESM it takes the first branch. If it's CJS, it takes the second branch and depNs.default will be the module.exports. If it's ESM, it also takes the second branch and depNS.default works just fine, since depMod would be the default export anyway. It only starts breaking if you add that layer, which doesn't seem necessary in the first place - the hack is only useful for transpiling import to require, not the other way around.

@guybedford
Copy link
Contributor Author

The entire ecosystem does not need to move to support __cjsModule. While it might, I've only laid out reasons why this is a useful check to be able to make. The interop cases are real interop problems that __cjsModule can solve whatever tools or users or code builders hit them.

Your scheme in #53848 (comment) does not work for a real ES module (not faux ESM) without a default export, since it will bail that 'dep' does not export a 'default' export. This is a new case created by supporting require(esm) for real ES modules that we need the __cjsModule check for.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 22, 2024

If what you need is the ability fully comparable with require(esm), can't you just use globalThis.process.getBuiltinModule('node:module').createRequire(import.meta.url)? That would be Node.js specific and only works on newer versions of Node.js, but the same can be said about this __cjsModule thing unless you can convince browsers to support it too.

@guybedford
Copy link
Contributor Author

guybedford commented Jul 23, 2024

The globalThis.process.getBuiltinModule('node:module').createRequire(import.meta.url) pattern does not support rebundling the module as part of another build process (say you later decide to inline dep). That is, it breaks transitivity in the workflow of applying transform and analysis to code, where tools generally want to operate on ESM modules. On the other hand, the __cjsModule pattern as I've described maintains this transitivity in workflows.

I think we're focusing a little too much on some hypothetical major adoption of this feature (I'm not really not sure what you mean by browser adoption), and a little too less on the problems it solves, and what reasons we might not want to land it. @joyeecheung if you are against this flag, perhaps you can more clearly state your underlying concerns?

@GeoffreyBooth
Copy link
Member

RollupJS supports CommonJS output too. As far as I know typically people bundle code into CommonJS format to speed up startup times (ESM loading is slower than CommonJS loading in the first place), or to use the bundle with SEA or snapshots since they do not support ESM. But choosing the ESM output for a Node.js environment is rare.

I’ve been building SvelteKit apps for a few years now, and SvelteKit uses Vite with Rollup to generate ESM production builds. Startup time isn’t a concern for long-lived servers, and these builds aren’t bundling the external dependencies; I still need my node_modules folder. The production build needs to be ESM so that I can import ESM dependencies.

So no, I wouldn’t say that building for ESM output in a Node.js environment is rare.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 23, 2024

perhaps you can more clearly state your underlying concerns?

I've already stated this in #53848 (comment)

It would be better to see transpilers actually pick this up and prove that it works before we add yet another underscore underscore property. Having one was unfortunate enough but was something we did for the sake of existing code that already relies on __esModule. Having another even though no code in the real world rely on __cjsModule convention and we only get some vague promise that it would be used seems even more unfortunate.

In general, adding more underscore underscore property should be frowned upon. __esModule should just be an exception that we had to make to ease the migration of a huge amount of existing configurations, given that it's already a de-facto standard, but not a precedent that should be casually followed, especially when there aren't even any integration tests and we don't know for sure whether this underscore underscore property would actually be used and work as intended.

@joyeecheung
Copy link
Member

I’ve been building SvelteKit apps for a few years now, and SvelteKit uses Vite with Rollup to generate ESM production builds. Startup time isn’t a concern for long-lived servers, and these builds aren’t bundling the external dependencies; I still need my node_modules folder. The production build needs to be ESM so that I can import ESM dependencies.

So it isn't transpiling CJS to ESM either, which is what this PR is about?

@guybedford
Copy link
Contributor Author

This PR is not claiming to set a convention that will work for all bundlers, rather it is providing a solution to the problem that when building CJS requires as externals into ESM imports, we have an interop problem in distinguishing CJS ESM wrappers from real ESM and that currently has no viable solution.

And I would quite like to be able to solve this problem for our Node.js users.

I've already stated this in #53848 (comment)

This argument is circular - you are asking for adoption, but then also claiming you are concerned about adoption. The concern and request cannot be for the same thing?

Would you be less concerned if we renamed the flag to nodeCommonJsModule or something like that without a __ prefix? We have full control over how the CJS ESM shell is defined in Node.js.

There is also another side to this PR and that is that using this flag, it is possible to define the representation of an ESM module in require(esm). I would have thought you'd be interested in that given it can be used as another tool to solve require(esm) interop issues.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 23, 2024

And I would quite like to be able to solve this problem for our Node.js users.

Not saying that it isn't a problem, but it seems very well like an issue we can follow up later. This path needs code originally in CJS being transpiled to ESM and loading an external ESM to be hit. That doesn't seem to be an urgent issue - for one not that many people actually configure to transpile CJS to ESM for it to be run in Node.js, if they need to ship ESM they likely would just write in ESM in the first place and never hit this path (probably what @GeoffreyBooth does too), let alone attempting to loading real external ESM from said CJS code (considering how hacky CJS -> ESM can be in terms of handling dynamic exports and imports and all the wonky require extensions, this practice already has bigger fishes to fry anyway). This is unlike the problem that __esModule intends to solve because transpiling ESM to CJS is a much more common and stable practice, attempting to load an external real ESM from it is also fairly common, which can be shown by the mass ERR_REQUIRE_ESM confusion out there on the Internet.

This argument is circular - you are asking for adoption, but then also claiming you are concerned about adoption.

I don't think I claimed to be concerned about adoption, rather I am concerned about the lack of adoption of a premature solution that's currently purely on paper, and we have no integration tests to verify that this solution actually works as intended.

I would have thought you'd be interested in that given it can be used as another tool to solve require(esm) interop issues.

I'd be more interested to see working prototypes, the current description of the solution is too vague and abstract, and we don't know for sure whether it actually works as intended.

@joyeecheung
Copy link
Member

joyeecheung commented Jul 23, 2024

Actually I realized that this is a breaking change, because unlike require(esm) import(cjs) is not experimental, and people might have been expecting the namespace object to be pristine e.g. people may have been effectively writing tests the way we did in the tests being updated in the PR here - and the tests had not changed for 5 years, but now this change would start breaking them and users would have to update their code:

import * as empty from './empty.cjs';
assert.deepStrictEqual({...empty}, { default: {} });    // it will throw after this change

Unlike __esModule, which we had to add to the required result directly to ease migration for existing configurations (and it's only effective in a new path that previously would just error), nobody is using __cjsModule anyway, so I think we could introduce something like this instead to avoid the breakage:

import { isCJSModule } from 'node:module';  // Or use process.getBuiltinModule();
import * as empty from './empty.cjs';
isCJSModule(empty);  // true

isCJSModule() (or other names) can just tag the module namespace using a weakmap to have fake private fields (weakmap semantics). If we go with this I am fine without integration tests since the compat risk of a side API is significantly lower.

@joyeecheung joyeecheung added the semver-major PRs that contain breaking changes and should be released in the next major version. label Jul 23, 2024
@@ -0,0 +1,3 @@
import dep from './dep.js';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should test three cases like what's done in the __esModule PR:

  1. When there is only a default export
  2. When there are named exports
  3. When there are both named and default exports

And deps should be put in node_modules to mimic the targeted structure.

@@ -233,6 +232,23 @@ This property is experimental and can change in the future. It should only be us
by tools converting ES modules into CommonJS modules, following existing ecosystem
conventions. Code authored directly in CommonJS should avoid depending on it.

To create an ESM module that should provide an arbitrary value to CommonJS, the
`__cjsModule: true` marker can be used instead:
Copy link
Member

@joyeecheung joyeecheung Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we need a more informative name (I don't see any need for underscored names, this would be declared explicitly by library authors, if export default is a thing, export const unwrapDefault doesn't seem too off either), and this should also be applied to the setExports('default') call for import cjs as explained in #54085 (comment) - IIUC that addresses your "what if depMod is CJS -> ESM transpiled" use case, if depMod defines this marker, the transpiled consumer no longer needs to unwrap the default again:

import * as depMod from 'deps';
// depMod is { default: transpiledNS.default, ...namedExportsFromCjsModuleLexer, __esModule: true, unwrapDefault: true }
const depNs = depMod.__esModule ? depMod : { default: depMod };  // Takes first branch
console.log(depNs.default);  // It's the original CJS module.exports, which was exported as `export default` in the transpiled ESM i.e. transpiledNs.default.

Compared to adding the marker to the wrapper, this has the advantage of not having to break existing import cjs users which may have been relying on a pristine namespace in the past 5 years and can be backported to v18-22.

Also, in the document we should warn users that this would make the named exports invisible to require(esm), so if they have any named exports, they should make sure to make it a property of the object that is being exported as default.

export default class Point {};
export function distance() { }
export const unwrapDefault = true;
// There is no way to get function distance from the module via `require()`
// unless the module defines `Point.distance = distance`.
// This is different from the behavior that ESM-transpiled-to-CJS libraries currently
// provide to their  CJS consumers should they wish to ship real ESM.
const { distance } = require('point');
distance();  // throws

@guybedford
Copy link
Contributor Author

@joyeecheung I've expanded the test cases to include various shapes include the string, object and also the faux esm cases of mixed, named and default exports, and separated the dependency into node_modules being consumed as CJS and ESM under this convention.

For the name, I'm open to changing to a name that is not prefixed with __ and that describes the pattern as being a CJS unwrapping. How do you feel about something like export const requireAsDefault = true ? The main thing if we drop __ is to ensure that it is suitably unique.

I'd also like to revisit this being a major change. The benefit of supporting this at the same time as putting it on CJS ESM wrapper namespaces is that those namespaces are then properly supported under CJS require, so that closes the loop on the lifting and lowering sides of the pattern. Whereas if we only land one side individually we potentially break transitive usage.

The argument here is that many objects support obtaining new fields as a non-breaking change, so surely this could be okay.

If we don't treat this as a minor change, we would need to split it up into two PRs further - one for the CJS unwrapping in require(esm) that is non breaking, and another breaking change for adding this to the import wrapper. I'd really prefer not to have to do that if possible though.

Copy link

codecov bot commented Aug 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.09%. Comparing base (67f7137) to head (9fcf217).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #53848      +/-   ##
==========================================
- Coverage   87.09%   87.09%   -0.01%     
==========================================
  Files         647      647              
  Lines      181845   181856      +11     
  Branches    34913    34921       +8     
==========================================
+ Hits       158373   158382       +9     
+ Misses      16751    16747       -4     
- Partials     6721     6727       +6     
Files Coverage Δ
lib/internal/modules/cjs/loader.js 92.94% <100.00%> (+0.01%) ⬆️
lib/internal/modules/esm/translators.js 92.16% <100.00%> (+0.11%) ⬆️

... and 24 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders needs-ci PRs that need a full CI run. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants