Description
Node.js Version
v22.8.0
NPM Version
10.8.2
Operating System
Linux xxx 6.5.0-41-generic #41~22.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 3 11:32:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
fs, v8, vm
Description
I'm working on a large application that loads around 15k files at startup and takes 20 seconds - a lot of this time I suspect is just compiling the modules. As such, I've been playing around with the new NODE_COMPILE_CACHE
option, and I'm a bit confused - this is primarily a request to understand the caching, or some hints on how to debug it myself.
- I registered a
load
plugin to get an exact count of the files loaded - it came out to 15312. The same exact count across multiple runs - exactly as I'd expect - A single run of the application generates between 2700 and 3800 cache files
- Why isn't this constant each time?
- Why isn't this closer to the 15k loaded files?
- an admittedly naive approach to figure out which files have been included in the cache shows that a very small % of application files seem to match any of the hashes - I recursively iterate over every single file in the application to find all files, then compute the hash using CRC32 (with the esm seed, etc)
- on the most recent first run, only 6 files matched - sibling files in the exact same folder did not match
- A second run of the exact same application code, without clearing the cache directory, generates 5155 total files (e.g., an additional 1300-2400 files)
- why are there new files at all?
- If the cache relies on some transient field for the hash key, why aren't there double?
- on the most recent second run, 54 files matched the cache
- on a third run, this increased to 61
Each run of the application should have been exactly the same - e.g., there are no dynamic imports, I literally ran npm start
multiple times.
Is there a good way of debugging this? Any hooks or logs that can be triggered either during cache insert, hit or miss? There is no obvious pattern to the files that were matched - e.g., there was a smattering of node_modules
and individual/pairs of files from folders within the application with very similar siblings, loaded in the exact same way.
Have I materially misunderstood the implementation of this feature? E.g., is the hash key perhaps dependant on load order?
Minimal Reproduction
It would be extremely difficult to provide a minimal repro here - I expect the complexity of the application is part of the problem. I was not able to repro on a trivial application (e.g., a single huge source file) - in this situation the cache contains a single entry and I'm able to match the hash from the absolute path to the cache file.
Output
No response
Before You Submit
- I have looked for issues that already exist before submitting this
- My issue follows the guidelines in the README file, and follows the 'How to ask a good question' guide at https://stackoverflow.com/help/how-to-ask