Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

Unify exports implementation across CJS and ESM #358

Closed
jkrems opened this issue Jul 23, 2019 · 26 comments
Closed

Unify exports implementation across CJS and ESM #358

jkrems opened this issue Jul 23, 2019 · 26 comments
Assignees

Comments

@jkrems
Copy link
Contributor

jkrems commented Jul 23, 2019

Right now we have two copies of the exports logic. One in CJS and one in ESM. We should try to reuse at least some of the implementation so we can spend less energy trying to sync the two.

@jkrems jkrems self-assigned this Jul 23, 2019
@jkrems
Copy link
Contributor Author

jkrems commented Jul 24, 2019

Step 1: Unify the test suite so that the entire behavior (and the differences) are covered.

@jkrems
Copy link
Contributor Author

jkrems commented Jul 24, 2019

@guybedford @bmeck While working on the test PR (nodejs/node#28831) I ran into the ERR_MODULE_NOT_FOUND vs. MODULE_NOT_FOUND inconsistency between ESM and CJS. Was that done on purpose or is that just an artifact of the error macros and could (should?) be fixed?

@bmeck
Copy link
Member

bmeck commented Jul 24, 2019

It was not intentional on my behalf

@jkrems
Copy link
Contributor Author

jkrems commented Jul 24, 2019

Created a dedicated issue in case somebody else beats me to it: #360

Trott pushed a commit to Trott/io.js that referenced this issue Jul 31, 2019
Refs: nodejs/modules#358

PR-URL: nodejs#28831
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
targos pushed a commit to nodejs/node that referenced this issue Aug 2, 2019
Refs: nodejs/modules#358

PR-URL: #28831
Reviewed-By: Guy Bedford <guybedford@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
@jkrems
Copy link
Contributor Author

jkrems commented Aug 13, 2019

I'm starting to lean towards closing this issue as wont-fix. I haven't found a good way to do this and as long as the tests are shared, it feels like we can live with two implementations of exports, just like we live with two implementations of resolution in general.

@GeoffreyBooth
Copy link
Member

Wasn’t there an issue for unifying resolution, at least as much as is reasonable? Can unifying exports be part of that?

@jkrems
Copy link
Contributor Author

jkrems commented Aug 14, 2019

Maybe..? Don't recall such an issue right now. But I'm not sure if it would realistically happen before unflagging (read: ever?) since there seem to be more urgent gaps that still need closing. I'm not sure if unifying the resolution code would actually get us enough value to be worth it.

@guybedford
Copy link
Contributor

One interesting argument for merging is that this question of virtualizing the file system keeps coming up for tools like Yarn.

Currently they can achieve this by hacking fs, but the C++ implementation of modules doesn't permit that.

If we were to move the module resolver back into JS in a shared way, we'd have the ability to easily share both the cache and fs virtualization.

@bmeck
Copy link
Member

bmeck commented Aug 15, 2019

I don't think allowing "hacking the fs" is a good reason to move something. That sounds like opening cans of worms.

@guybedford
Copy link
Contributor

The can of worms is not allowing virtualization, because the result would be resource hooks within the C++ resolver and JS codebase.

Resolution is a hot path... the more hooks the slower it gets, the slower Node gets.

@guybedford
Copy link
Contributor

That is, unless there's a clever way to virtualize the lib_uv fs implementation itself.

@bmeck
Copy link
Member

bmeck commented Aug 15, 2019

@guybedford fs is tied to OS related things like child processes and shared libraries, that cannot be virtualized. It isn't a simple question of letting people replace fs. You remove the ability to assert things about files if fs is virtualized.

@addaleax
Copy link
Member

One interesting argument for merging is that this question of virtualizing the file system keeps coming up for tools like Yarn.

Currently they can achieve this by hacking fs, but the C++ implementation of modules doesn't permit that.

If we were to move the module resolver back into JS in a shared way, we'd have the ability to easily share both the cache and fs virtualization.

Fwiw, that’s an issue I’ve had with the modules implementation the entire time it existed – it should not be concerned with file system implementation details at all, that’s just a sign of a design that lacks orthogonality.

This also affects embedders like Electron, who do support loading modules from files that do not exist on-disk, and would affect us if we start supporting e.g. loading code from tarballs.

@bmeck
Copy link
Member

bmeck commented Aug 15, 2019

@addaleax some things like dlopen don't work on virtual files. I don't see how they can be orthogonal.

@addaleax
Copy link
Member

@addaleax some things like dlopen don't work on virtual files. I don't see how they can be orthogonal.

One way to work around that is to write shared libraries to a temporary path on disk and open that file instead.

@bmeck
Copy link
Member

bmeck commented Aug 15, 2019

@addaleax yes, this was the design I used back in 2014 for Noda but cleanup is unreliable and a lot of extra copies of libraries were written to disk.

@guybedford
Copy link
Contributor

guybedford commented Aug 15, 2019

It is common for a resolver implementation to be based on a virtualized file interface, which would be easier to do in JS.

The standard model is to expose a readFile, stat and realpath (/ readlink) implementation. Most JS resolver implementations converge on this model, including the ones I've written personally, the ones I've written for Zeit and the TypeScript hooks.

dlopen is not needed within the resolver implementation.

@bmeck
Copy link
Member

bmeck commented Aug 15, 2019

@guybedford dlopen is required during other loading operations and the conversation turned towards allowing alternative fs for loading operations and did not seem limited to resolvers given:

both the cache and fs virtualization.

What cache and general fs virtualization was being talked about?

@guybedford
Copy link
Contributor

What cache and general fs virtualization was being talked about?

The cache is that we don't share the package.json cache between CJS and ESM currently to avoid crossing boundaries. The fs virtualization is the standard Yarn pnp / Tink model, which Yarn and others will ask for. And with no ESM solution it's not good for us to stifle innovation.

Loaders cannot solve virtualization because import.meta.url must be expected to work through fs.readFile, just like __filename needs to apply into fs.readFile for CommonJS being the reason for fs-level interception as opposed to resolver-level interception via readdressing.

@guybedford
Copy link
Contributor

guybedford commented Aug 15, 2019

I don't think we should be working out the perfect hooks model for them here, that is their job. The dlopen exception etc is theirs to sort out.

But I'm just noting that a JS resolver unification would avoid these problems, which should not be so easily dismissed. Perfect is the enemy of good and all.

@bmeck
Copy link
Member

bmeck commented Aug 15, 2019

Loaders cannot solve virtualization because import.meta.url must be expected to work through fs.readFile

I don't think this is true? What requires this? I know on the web import.meta.url != the location of the response and I can imagine things like https/data/etc. URLs not working with fs.

I don't think we should be working out the perfect hooks model for them here, that is their job. The dlopen exception etc is theirs to sort out.

I think thats fine, but I don't think this should be hooking fs I would rather these specific APIs and the expected constraints for them be separate from fs.

@guybedford
Copy link
Contributor

guybedford commented Aug 15, 2019

I don't think this is true? What requires this? I know on the web import.meta.url != the location of the response and I can imagine things like https/data/etc. URLs not working with fs.

We cannot guarantee that packages on npm will not use fs calls against their own package relatively.

Consider for example a library that loads custom templates from a local folder. It's very common. And users will write code that uses fs, and thus assumes import.meta.url is a relative file:/// scheme.

Tink and Yarn pnp have to be backwards compatible, and if they try to use a different scheme, then fs.readFile(fileURLToPath(import.meta.url)) breaks. They have to avoid these common breaks however they can, using techniques such as hooking fs.

Schemes might work if they support those new schemes in fs itself as well. But ensuring path.resolve etc all play out there is tough. Again these are problems and decisions for these tools to solve, and not ones that I think should dictate our designs.

I am not arguing for anything strong here, just really noting two things:

  1. By not allowing easy fs hooks in the ESM resolver, we are inadvertently restricting their solution space. They have to either implement their own resolver now, or implement their custom scheme for fs calls as well.
  2. A resource hooks model that intercepts fs at a C++ and JS layer before lib_uv is something I am strongly against for performance and complexity reasons.

I hope we can bear the above in mind as these virtualization arguments play out.

@bmeck
Copy link
Member

bmeck commented Aug 15, 2019

Again these are problems and decisions for these tools to solve, and not ones that I think should dictate our designs.

I feel that way about most of the last comment, lots of these problems seem to be scoped to getting the resolver working. If you look from a different angle, things like spawning child processes, dlopen, passing/saving paths between processes, things like using path.resolve, etc. also have problems. Neither side is going to be a universal win. I would prefer we pick a more general side that does allow the possibility of working with things like alternative URL schemes which are talked about with loader hooks and even supported in other runtimes.

By not allowing easy fs hooks in the ESM resolver, we are inadvertently restricting their solution space. They have to either implement their own resolver now, or implement their custom scheme for fs calls as well.

I agree we need to find a solution to hooking CJS/ESM both. I don't think we have to be as restrictive as we are, but using fs seems problematic especially since it has issues with virtualization even if there is some backwards compat with existing hacks. Using existing hacks to model use cases seems good, but using them to enforce API design and limit forwards compatibility seems bad.

A resource hooks model that intercepts fs at a C++ and JS layer before lib_uv is something I am strongly against for performance and complexity reasons.

I don't see how an alternative would be able to avoid the same kind of interception. How would userland solve performance or reduce complexity for this that the runtime cannot? Hooking the entire fs would mean implementing far more than the hooks you describe above.

@guybedford
Copy link
Contributor

I agree our role shouldn't be one to solve, and for that reason we should avoid prematurely trying to push APIs and abstractions that solve these problems before we know what the correct solution is.

Rather, we should be aware of our role in the muddy problem space.

And a unified CJS / ESM resolver makes the mud slightly easier to work with.

@bmeck
Copy link
Member

bmeck commented Aug 15, 2019

And a unified CJS / ESM resolver makes the mud slightly easier to work with.

I am unclear what this means, CJS/ESM currently have differing resolution algorithms and potentially timings (CJS enforces sync/blocking, ESM does not). With those continuing to differ I doubt there is actual utility in trying to share the same resolver functions.

I think a shared way of handling resources and cache-ing results would be enough. I think porting ESM's resolver to JS would help and is more realistic now than when it was originally wrote thanks to primordials. I would still be -1 to intentionally allowing patching of fs to leak into ESM as it does not currently and I don't think that API is the correct way to do things. If we want to hold of on solving the API for fs style resource hooks, we shouldn't be doubling down on hacks using non-robust or guaranteed contracts.

@jkrems
Copy link
Contributor Author

jkrems commented Sep 18, 2019

I think this is effectively a larger discussion around sharing logic between the two resolvers and goes beyond exports. Going to close this issue since I'm not seeing this happen and I think sharing the test suite gets us most of what we would've wanted from a shared implementation (reassurance that they both do the same thing).

@jkrems jkrems closed this as completed Sep 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants