-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESM loaders need a way to list exports in load() #47888
Comments
To respond to each option:
|
cc @nodejs/diagnostics @nodejs/loaders |
This sounds the most promising to me. Node doesn't have the info you're looking for until later in the process, so trying to do everything in a Is there a particular reason for restricting replacements to only exports that already existed? I think that is where your hang-up is; testdouble do something similar but, I believe, don't have that restriction—precluding this issue for them. |
Isn't 4 pretty simple to do? (Just publish to the diagnostic channel when an import is resolved if a hook/flag is passed) |
How would that work? We need to rewrite the module text to register a function to modify the value of the reference being exported so we need the names of those exports during load to know what code to generate. You can see how it currently works in the code linked by Bryan: https://github.com/DataDog/import-in-the-middle/blob/b49545c48994d5268b04a96dc5c83f25a8be55b7/hook.js#L113-L131
I have #44340 to add diagnostics_channel events to the module lifecycle, which I'll try to get to updating soon. That's not really a solution though unless we pass the module record in as mutable, which got a bunch of pushback when that was suggested before. It's also a bit counter to the intent of diagnostics_channel as it's meant to be a passive observer and not a way to mutate state. The ultimate plan, though it will take a long time, is to get modules themselves to start producing diagnostics_channel data themselves and then use module load events to know when we need to attach subscribers to channels from a particular module. That is a far off future so we need a solution for the immediate term or Node.js 20 users will not be able to use application tracers with ESM. |
Side note: This breaks OpenTelemetry too as they also rely on import-in-the-middle for ESM support. |
I think the immediate solution is for IITM to add |
Using |
Rewriting to use acorn would unfortunately probably be more work than is worthwhile for how few people are actually using ESM. It's also a rather large dependency to add just for that. I worry if that's the only fix we have it might be left broken for awhile... 😐 @bengl Do we have any bandwidth to do something about this given our other priorities this quarter? |
May I suggest option 6? Have the In this option Granted, communication between API and loader just got more difficult because they're in separate threads, but it is possible. |
I think this would work but is what I suggested would be a non-trivial performance hit for something that almost no-one needs. But also, if I understand correctly, it is not dependable until the chain has finished because a subsequent loader could completely change the source, making the info retrieved from that |
@JakobJingleheimer my suggestion was a change in the IITM package, not the Node.js loader code. |
@Qard It's not particularly hard to use I think just to unblock our users here, we're going to have to implement option 1. That said, I don't think it's the ideal solution. @giltayar Your option 6 seems like it might plausibly work. It seems like there may be some chicken-and-egg problems with it though, but I think those can be worked around. I'll play around with this, after implementing option 1 to at least unblock our users for now. |
However you were able to do it before the off-thread change, you should still be able to do it after the off-thread change as long as you use a communications channel between the threads, which exists. It’s not very clear to me (or @targos, apparently) how the previous method worked exactly, so maybe you don’t want to patch over the off-thread stuff to recreate the previous approach, but it should be doable. Something else to consider as you’re attempting option 1 is that you could use a parser other than I don’t really follow how the old process worked; does it dynamically import a module to read its exported names, and then that import is discarded? So then when the code does the “real” import, that parsing and evaluating is all done again or partially done again (like maybe the parsing is still in V8’s cache but the evaluation happens twice)? If there’s any inefficiency in the old method, it could very well be that a good parser might be faster overall than option 6. |
I implemented option 1 to unblock our users in That said, another alternative (that I doubt would be popular, and is probably very non-trivial) is to provide a mechanism within Node.js for bypassing the immutability of exports, eliminating the need for IITM altogether. |
I vaguely recall something about the “linking” step, where the various exports of each module are attached to each other, as happening via calls to V8; and that V8 doesn’t allow these (or the modules referenced) to be changed afterward. That’s why hot module reload in ESM has been such a challenge, because we lack APIs in V8 to modify modules once they’re loaded in. I think module parsing is a fine solution and should work well for many users, especially if you’re using native parsers like cjs-module-lexer. I would imagine that most users of apps with instrumentation libraries aren’t all that concerned about initial startup time, as I’d guess that people are mostly instrumenting long-running servers. I think the long-term solution is to eventually migrate away from mocking/proxying to using the diagnostics channel? I would think that parsing should be fine until then. I’ll close because it seems like this is resolved, but happy to reopen if there’s anything else. |
@bengl I had a thought regarding this: have you considered doing the import-in-the-middle (and/or require-in-the-middle) patching at install time rather than at runtime? Something like As I see it, besides the obvious benefit of making instrumentation of ESM just as good as instrumentation of CommonJS, this has a few other advantages:
Yes you’re modifying the code on disk, but I don’t see how it’s any better to modify code at runtime than ahead of time. It introduces some slight complexity if you want to turn it off, like to enable patching only in production say, but this feels manageable. Another consideration that comes to mind is the Node version; at runtime you know exactly what version you’re using, but at install/build time a different version might be in use and you’d want to ensure that the patching done is for the intended Node version. And of course you need to do the work to implement this, of using something like Acorn to be able to patch these JavaScript files safely, but it sounds like you might have already written that code for the ESM version. |
@GeoffreyBooth for some users, that might make sense. Others may prefer the For most instrumentations, Node.js version isn't particularly relevant, since the instrumented code doesn't change much, but if it does come up, we can add another switch for that. 🤔 We'll certainly consider this. One other thing to note here is that as native ESM has become more popular for Node.js apps, APM vendors have had to catch up here, and today at least three other vendors (plus OpenTelemetry and distros of it), all use |
You also can patch those without needing the hooks API and without monkey-patching the CommonJS loader. Something like const { patchedFs, patchedHttp } = require('datadog/patchedBuiltins');
const fs = require('node:fs');
const http = require('node:http'); // And so on for all the builtins you want to patch
const { syncBuiltinESMExports } = require('node:module');
for (const api of patchedFs) {
fs[api] = patchedFs[api];
}
for (const api of patchedHttp) {
http[api] = patchedHttp[api];
}
syncBuiltinESMExports(); See https://nodejs.org/api/module.html#modulesyncbuiltinesmexports, which is the API to sync whatever patching you do to builtins in CommonJS to also apply to ESM. It dates all the way back to Node 12, when we originally unflagged the ESM implementation. And going forward, I would hope that gradually you shouldn’t need to patch any builtins as the ones you care about all get support for Diagnostics Channel. Obviously that won’t help for old Node versions, but hopefully the real version of the above script can exclude certain builtins based on the Node version. |
Besides patching builtins, what else do you patch? Is it essentially a predefined list of modules and exports, like “for Because if so, could you perhaps just publish to NPM the patched versions of those libraries? Like I’m the maintainer of CoffeeScript, which has always allowed things like Yet another option could be to add the instrumentation as part of some existing build process, so that you’re not modifying the |
Transitive dependencies are often a problem there. We patch many things like body-parser and find-my-way which are generally way down the dependency tree somewhere which makes it difficult to tell users to use that when they're using it indirectly through something else. Also, the build step approach would be complicated. There are cases where patches are specific to Node.js version or OS and there are many cases like Lambda where the build doesn't necessarily happen in an environment that matches the target machine. There are also cases where we need to patch native binding code, such as with postgres, which doesn't exist in any useful form until runtime. We will always require a bunch of runtime things, we can't really escape that. |
Yep. Currently we do this in our library code (i.e. on the main thread) rather than in the loader, and the loader simply enables that to happen, but that's an implementation detail.
That could work, and is even the norm for some other languages we support. For Node.js though, one of our design goals is to be minimally invasive to developer workflows, and this approach would be contrary to that. For example, many of our users aren't developers and are instead part of devops teams or observability teams who don't have access to the app in question apart from, say, a docker image. (Of course, the same problem can apply to modifying
Exactly. For folks who bundle with esbuild, we provide this as the preferred approach. (Adding support for other bundlers is on our roadmap.)
Nothing would stop us from publishing versions of
💯 |
How long is this list? Perhaps the patched versions could live in your library, and within the |
The problem with that is we currently get support for a lot of things for free because X thing we don't know/care about depends on Y thing we do have instrumentation for. This model would change that though. We would need to make custom versions with patched dependencies of tons of things we don't have any explicit support for which, to me, makes that idea seem likely untenable. The install-time patching approach is probably more realistic, but also not without its issues. I'm not sure there's a good solution other than just having runtime mutable modules, which is what I had intended to do with diagnostics_channel events of the module loading lifecycle as loaders are really designed around source text modification, not runtime object modification. |
Regarding this idea of shipping patched source files within the APM packages's payload and returning them from the resolve step:
|
Hello Node.js! Today I'm wearing my Datadog APM hat!
import-in-the-middle
(IITM) exists to roughly replicate the behaviour ofrequire-in-the-middle
, but for ESM. Exported bindings needed to be made editable externally to a given module. ESM doesn't allow for this like CommonJS does, so IITM stores exports for a given module in a wrapper's scope, and exports those identifiers. ESM lets modules modify their own exports dynamically, so this approach gives IITM a backdoor to modify exports in arbitrary modules. (AFAIK this is because what's actually exported is the variable binding, and not the value it's bound to.)In IITM, the code inside the
load()
hook looks like this (adapted from code put together by @devsnek):https://github.com/DataDog/import-in-the-middle/blob/b49545c48994d5268b04a96dc5c83f25a8be55b7/hook.js#L113-L131
On line 114, the module to be wrapped is directly imported in order to get the list of export names. This allows us to define a module-scoped variable for each export from the wrapped module, and re-export it in the wrapper. We also take this opportunity to register a setter that's used later on by users of IITM to modify the module.
In Node.js v20.x, this doesn't seem to work. IITM tests fail, showing that things intended to be wrapped aren't being wrapped at all. My guess is that because we're dynamically importing the intended module in the
load()
hook on the loader thread, previous assumptions about how the modules are loaded recursively don't apply. Even if we can safely load all of the tree in the loader thread (which might not be the case), we'd then be loading the entire dependency tree twice. Without a workaround, solutions for shimming (e.g. for APM purposes) are severely limited.The only reason we're doing this import on line 114 is to get the list of exports that we need to make module-scoped variables for. If we can't do that, then we have a number of options:
Option 1: Use a parser in the hook
If we simply parse the source of the file, we can can find its export list statically, and not bother with the import tree at all in the loader thread. This would generally work, but including a parser bloats IITM and increases the overhead it introduces at startup.
Option 2: Provide access to Node.js' built-in parser
acorn
is used by some existing Node.js features, and could be provided as a new built-in module in Node.js. If desired, it could be restricted to the loader environment. This would solve the package bloat problem for IITM, but wouldn't help with startup overhead.Option 3: Provide a list of export names as a parameter to the
load()
hookI'm not sure how this would be done. Probably with parsing? Perhaps with parsing that's already being done for other reasons? Either way, it would completely trivialize this problem.
Option 4: Discard the IITM concept
Overall, it's not great to be just subverting ESM's mutability rules in order to modify stuff post-load, but it's certainly the easiest way to map our existing CommonJS instrumentations for APM to ESM. We could re-work the concept entirely and do all our shimming in one-off loaders for each module we need to shim, but that would mean rewriting pretty much all of our instrumentation code. That's not even counting the weirder interactions between ESM apps and CommonJS dependencies that we'd have to account for here. This is an effort we'd like to avoid, since we'd like to see a future where
TracingChannel
enables us to avoid shimming entirely.Option 5: Revert to (or allow for reverting to) single-threaded behaviour
If we could revert back to the single-threaded behaviour, even via a flag, this would likely be resolved for us.
The text was updated successfully, but these errors were encountered: