-
-
Notifications
You must be signed in to change notification settings - Fork 33.8k
src: cache missing package.json files in the C++ package config cache #60425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
src: cache missing package.json files in the C++ package config cache #60425
Conversation
|
Review requested:
|
cd265c8 to
9c5c99f
Compare
9c5c99f to
003f911
Compare
003f911 to
ede35c9
Compare
src/node_modules.cc
Outdated
| return nullptr; | ||
| } | ||
|
|
||
| void BindingData::GetNearestParentPackageJSON( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we use existing methods rather then adding a new method?
- I believe
THROW_IF_INSUFFICIENT_PERMISSIONScall is missing in this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was re-added as a straight revert of the code I deleted in #59888, but yeah it is 99% the same as GetNearestParentPackageJSONType. I can refactor this to remove that redundancy if you'd prefer!
I think TraverseParent handles the permissions checking you're referring to here, but it doesn't throw if there are insufficient permissions, it just returns nullptr. I'm not sure I have enough overall context to judge if that's right or wrong, but it's what this has always done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled the common path-handling stuff into a method to reduce the redundancy here a bit. There were also changes in this area rebasing onto main pulled in, so it looks a tiny bit different now
|
After these 2 review comments I've left, I think we should merge this. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #60425 +/- ##
==========================================
- Coverage 88.56% 88.54% -0.02%
==========================================
Files 704 704
Lines 208101 208109 +8
Branches 40083 40078 -5
==========================================
- Hits 184299 184279 -20
- Misses 15827 15870 +43
+ Partials 7975 7960 -15
🚀 New features to boost your workflow:
|
ede35c9 to
86c48ea
Compare
|
Lint issue is unrelated: #60636 |
Fixes #60397
In #59888 the nearest parent package JSON cache
package_json_reader.jswas adjusted from a map from any given module path to a representation of its parentpackage.jsonfile to a map frompackage.jsonpaths to a deserialized representation of their content. This addressed excessive memory usage caused by repeatedly caching identical deserializedpackage.jsonobjects for modules that shared a parentpackage.json, but also reintroduced a filesystem traversal inpackage_json_reader.jswhich finds the nearest parentpackage.jsonfile for a given module. Thestatcalls in this traversal are not cached, so we end up potentially issuing them for a bunch of duplicate paths. In the reported issue, this leads to poor performance for users using potentially high-latency network filesystems. Similar poor performance is also observed in Node versions that lack #59086, which (re)introduced the JS-side cache initially.This PR addresses this by unwinding the changes in #59888 and instead making the C++-side
package.jsoncache a bit more expressive, caching both a deserialized representation of apackage.jsonfile at a given path, as well as an indicator if no such file exists (modeled as anstd::optional). This addresses the poor performance reported in #60397 by:statcalls inpackage_json_reader.jspackage.jsonpaths on the C++ side, which also perform poorly on high-latency filesystemsWhile analyzing the performance of these changes, I noticed a confounding factor which is that the lazy-parsing and caching of
importsandexportson deserialized package configuration objects indeserializePackageJSONwasn't working as expected and was also contributing to the varying performance we've been seeing across these changes:importsandexportsis cached on deserializedpackage.jsonobjects, it's important that a givenpackage.jsonfile map to the same deserialized object. If we don't do this, we repeatedly re-parse these fields redundantly across calls. This motivates the sort of strange two-level caching scheme ingetNearestParentPackageJSONthat these changes introduce. The downside here is that we potentially redundantly call intomodulesBinding.getNearestParentPackageJSONfor a given path just to resolve the path to apackage.jsonfile that we may already have cached, but I don't see any way to avoid this.Benchmarks
I benchmarked this change with the same scripts I used in #59888. The first is the reproduction script from #58126:
The second is this:
Each benchmark compares v22.19.0 (which does not include #59888), v25.1.0 (the latest current release, which does include #59888), and this change (which is just the
nodedirectory in the output).Fast disk
ddtrace+ CDKdate-fnsSlow disk
I emulated this by mounting an NFS volume from
localhostwithnoac(to disable most caching).ddtrace+ CDKdate-fns