-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
refactor(es/preset-env): Use phf for corejs3 entry #10712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CodSpeed Performance ReportMerging #10712 will not alter performanceComparing Summary
|
df5209f to
9543119
Compare
| }) | ||
| .collect() | ||
| }); | ||
| include!("../generated/corejs3_entries.rs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about committing binary files into the repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some ways this is not worse than include the original json, since these binary files are more compact. but I agree there must be a better way.
One solution is that we can use .gitignore to hide generated files, but we have to generate them manually when cargo publish. This is similar to the behavior of browserslist-rs.
Another solution is that we can use git-lfs to manage binary files. This is a common practice when you have to track binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A binary file is hard to inspect so I consider it as a security risk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect many people to have similar opinion as me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binary file is generated deterministically, and we can verify its consistency on ci.
- Use
.gitignoreto prevent generated files from be stored. generate them only in publish-crates ci. provide tools for ci and anyone to verify the consistency of files in git tags and crates. - Store generated files in repository. Use ci to verify that the binary file modifications in the PR are consistent with the original json file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you split this file as a separate PR (and I'll merge it first) so we can profile the difference using codspeed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not looking forward to the performance of this implementation, I know there is a lot of space for optimization. The main purpose here is to see if we are happy with the integration in this form.
**Description:** See #10712 (comment)
|
I think this PR regresses the performance, but can you rebase? #10722 (comment) is 28.1 us |
9543119 to
1e42f37
Compare
|
It regresses the performance |
|
Sure. That matches my local testing. I think it's within the range that micro-optimizations can make up for. The main purpose of this PR is to see if we're happy with this integration. |
1e42f37 to
36d7e97
Compare
|
Is the goal reducing the binary size? Or is the goal reducing the compile time? If it's the former, I'm fine with generating binary file with a build script at the build time. |
|
Our goal is binary size, but I don't want to significantly affect the compilation speed. Current our scale is acceptable to put it to build script, but should be note that build phf is a loop solve process, which cannot be guaranteed to be completed. If there is a larger scale of data in the future, then the phf solve in build script will significantly affect compilation time. I made some optimizations and I believe we are now faster than the main branch.
but the biggest speedup came from the hasher replacement. I used |
|
Then I think we should commit the binary file to the repository, but I think we need a way to verify it, like adding many snapshot tests for the embeded binary so if it's modified the test fails. (Without cryptic numbers in code, ideally. Is it possible?) |
Checking the README for foldhash, it doesn't seem to be suitable for this either. I think in one of the issues they mentioned ensure the hash output is consistent with the semvar version. I realized another benefit of the build script, which is that it will not be affected by the stable of hasher. |
This is ok because our codegen is completely deterministic. we can generate data to a temporary directory when test, and then compare it with the data in commit to see if it is consistent.
The only cryptic numbers I suggest keep are the seed used in last successful phf build, to ensure fast builds. |
|
Ah sounds good. So the seed numbers should be stored in the build scripts? |
|
Yes, it is stored in build script. I'm trying to change to build script. At least for now it doesn't affect compilation performance too much. |
4e241d9 to
9bf1c8c
Compare
9bf1c8c to
9f94427
Compare
|
All alerts resolved. Learn more about Socket for GitHub. This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored. |
9f94427 to
74309de
Compare
|
I'll release version precomputed-map 0.2 tonight so it can be merged. if we're happy with it, I'll start replace other static maps. |
74309de to
418e297
Compare
|
Can we use phf crate instead? What’s the difference? |
|
It has a crate to generate code AFAIK |
|
@kdy1 The phf crate uses a different algorithm and requires the use of siphasher, and it does not do any string pack. This makes it have no advantages in performance, compilation speed, and size. |
|
I have an post about this (English coming soon) |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
**Description:** This is a follow-up to #10712 .
Description:
This is a follow-up to #10684. I simply chose an example to experiment with.
I did some simple testing and I believe there is no large regression in single query performance.
Since we do dedup when packing strings, it is more compact than the original json and has some size optimization.
Since
../../data/core-js-compat/entries.jsonis 500k large, I believe it also saves close to 500k of memory.BREAKING CHANGE:
Related issue (if exists):