Skip to content

Conversation

@quininer
Copy link
Contributor

@quininer quininer commented Jun 26, 2025

Description:

This is a follow-up to #10684. I simply chose an example to experiment with.

I did some simple testing and I believe there is no large regression in single query performance.

// before
es/preset-env/entry/import
                        time:   [531.47 ns 532.42 ns 533.55 ns]

// after                        
es/preset-env/entry/import
                        time:   [549.62 ns 551.02 ns 552.55 ns]

Since we do dedup when packing strings, it is more compact than the original json and has some size optimization.

// before
16397965	../../target/release/deps/polyfills-d11a2ed3340dd897

// after
16070261	../../target/release/deps/polyfills-4b49b166ca0322eb

Since ../../data/core-js-compat/entries.json is 500k large, I believe it also saves close to 500k of memory.

BREAKING CHANGE:

Related issue (if exists):

@changeset-bot
Copy link

changeset-bot bot commented Jun 26, 2025

⚠️ No Changeset found

Latest commit: 08c3a56

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@CLAassistant
Copy link

CLAassistant commented Jun 26, 2025

CLA assistant check
All committers have signed the CLA.

@codspeed-hq
Copy link

codspeed-hq bot commented Jun 26, 2025

CodSpeed Performance Report

Merging #10712 will not alter performance

Comparing quininer:s/use-precomputed-map2 (08c3a56) with main (43a4f11)

Summary

✅ 141 untouched benchmarks

@quininer quininer force-pushed the s/use-precomputed-map2 branch from df5209f to 9543119 Compare June 26, 2025 06:59
})
.collect()
});
include!("../generated/corejs3_entries.rs");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about committing binary files into the repository

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some ways this is not worse than include the original json, since these binary files are more compact. but I agree there must be a better way.

One solution is that we can use .gitignore to hide generated files, but we have to generate them manually when cargo publish. This is similar to the behavior of browserslist-rs.
Another solution is that we can use git-lfs to manage binary files. This is a common practice when you have to track binaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A binary file is hard to inspect so I consider it as a security risk

Copy link
Member

@kdy1 kdy1 Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect many people to have similar opinion as me

Copy link
Contributor Author

@quininer quininer Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary file is generated deterministically, and we can verify its consistency on ci.

  1. Use .gitignore to prevent generated files from be stored. generate them only in publish-crates ci. provide tools for ci and anyone to verify the consistency of files in git tags and crates.
  2. Store generated files in repository. Use ci to verify that the binary file modifications in the PR are consistent with the original json file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you split this file as a separate PR (and I'll merge it first) so we can profile the difference using codspeed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not looking forward to the performance of this implementation, I know there is a lot of space for optimization. The main purpose here is to see if we are happy with the integration in this form.

kdy1 pushed a commit that referenced this pull request Jun 27, 2025
@kdy1
Copy link
Member

kdy1 commented Jun 27, 2025

I think this PR regresses the performance, but can you rebase?

#10722 (comment) is 28.1 us

@quininer quininer force-pushed the s/use-precomputed-map2 branch from 9543119 to 1e42f37 Compare June 27, 2025 03:42
@kdy1
Copy link
Member

kdy1 commented Jun 27, 2025

It regresses the performance

@quininer
Copy link
Contributor Author

Sure. That matches my local testing. I think it's within the range that micro-optimizations can make up for. The main purpose of this PR is to see if we're happy with this integration.

@quininer quininer force-pushed the s/use-precomputed-map2 branch from 1e42f37 to 36d7e97 Compare July 4, 2025 05:50
@kdy1
Copy link
Member

kdy1 commented Jul 4, 2025

Is the goal reducing the binary size? Or is the goal reducing the compile time? If it's the former, I'm fine with generating binary file with a build script at the build time.

@quininer
Copy link
Contributor Author

quininer commented Jul 4, 2025

Our goal is binary size, but I don't want to significantly affect the compilation speed.

Current our scale is acceptable to put it to build script, but should be note that build phf is a loop solve process, which cannot be guaranteed to be completed. If there is a larger scale of data in the future, then the phf solve in build script will significantly affect compilation time.

I made some optimizations and I believe we are now faster than the main branch.

  1. Adjusted hot path
  2. Eliminated indirect address access
  3. Eliminated one bounds check (but still kept most of them to ensure handling of untrusted binary data)
  4. Eliminated utf8 check

but the biggest speedup came from the hasher replacement. I used foldhash (It is the current default hasher for hashbrown), which is still slower than fxhash, but fxhash currently has no suitable crate. The rustc-hash crate does not guarantee the stable of output across architectures, and the other fxhash crate is unmaintained.

main
$ taskset -c 0 env RUST_LOG="off" cargo bench -- import
es/preset-env/entry/import
                        time:   [510.44 ns 511.30 ns 512.85 ns]
                        
phf
$ taskset -c 0 env RUST_LOG="off" cargo bench -- import
es/preset-env/entry/import
                        time:   [505.31 ns 506.69 ns 509.38 ns]

@kdy1
Copy link
Member

kdy1 commented Jul 4, 2025

Then I think we should commit the binary file to the repository, but I think we need a way to verify it, like adding many snapshot tests for the embeded binary so if it's modified the test fails. (Without cryptic numbers in code, ideally. Is it possible?)

@kdy1 kdy1 added this to the Planned milestone Jul 4, 2025
@quininer
Copy link
Contributor Author

quininer commented Jul 4, 2025

You expect foldhash to have a consistent output across versions or platforms, such as for persistent file formats or communication protocols.

Checking the README for foldhash, it doesn't seem to be suitable for this either. I think in one of the issues they mentioned ensure the hash output is consistent with the semvar version.

I realized another benefit of the build script, which is that it will not be affected by the stable of hasher.

@quininer
Copy link
Contributor Author

quininer commented Jul 4, 2025

Then I think we should commit the binary file to the repository, but I think we need a way to verify it, like adding many snapshot tests for the embeded binary so if it's modified the test fails.

This is ok because our codegen is completely deterministic. we can generate data to a temporary directory when test, and then compare it with the data in commit to see if it is consistent.

(Without cryptic numbers in code, ideally. Is it possible?)

The only cryptic numbers I suggest keep are the seed used in last successful phf build, to ensure fast builds.

@kdy1
Copy link
Member

kdy1 commented Jul 4, 2025

Ah sounds good. So the seed numbers should be stored in the build scripts?

@quininer
Copy link
Contributor Author

quininer commented Jul 4, 2025

Yes, it is stored in build script. I'm trying to change to build script. At least for now it doesn't affect compilation performance too much.

@quininer quininer force-pushed the s/use-precomputed-map2 branch 2 times, most recently from 4e241d9 to 9bf1c8c Compare July 4, 2025 08:35
@quininer quininer force-pushed the s/use-precomputed-map2 branch from 9bf1c8c to 9f94427 Compare July 7, 2025 05:10
@quininer quininer marked this pull request as ready for review July 7, 2025 05:11
@quininer quininer requested a review from a team as a code owner July 7, 2025 05:11
@quininer quininer requested a review from a team as a code owner July 7, 2025 05:11
@socket-security
Copy link

socket-security bot commented Jul 7, 2025

All alerts resolved. Learn more about Socket for GitHub.

This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored.

View full report

@quininer quininer force-pushed the s/use-precomputed-map2 branch from 9f94427 to 74309de Compare July 7, 2025 05:17
@quininer
Copy link
Contributor Author

quininer commented Jul 7, 2025

I'll release version precomputed-map 0.2 tonight so it can be merged. if we're happy with it, I'll start replace other static maps.

@quininer quininer force-pushed the s/use-precomputed-map2 branch from 74309de to 418e297 Compare July 7, 2025 05:22
@kdy1
Copy link
Member

kdy1 commented Jul 7, 2025

Can we use phf crate instead? What’s the difference?

@kdy1
Copy link
Member

kdy1 commented Jul 7, 2025

It has a crate to generate code AFAIK

@quininer
Copy link
Contributor Author

quininer commented Jul 7, 2025

@kdy1 The phf crate uses a different algorithm and requires the use of siphasher, and it does not do any string pack. This makes it have no advantages in performance, compilation speed, and size.

@quininer
Copy link
Contributor Author

quininer commented Jul 7, 2025

I have an post about this (English coming soon)

@socket-security
Copy link

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcopyfiles@​2.4.110010010076100
Addedwebpack-cli@​3.3.129710010082100
Addedwebpack-dev-server@​3.11.3969210086100
Addedwebpack@​4.47.09710010090100

View full report

@kdy1 kdy1 changed the title perf(es/preset-env): Use phf for corejs3 entry refactor(es/preset-env): Use phf for corejs3 entry Jul 7, 2025
@kdy1 kdy1 merged commit 658b26d into swc-project:main Jul 7, 2025
169 checks passed
@quininer quininer deleted the s/use-precomputed-map2 branch July 7, 2025 15:42
@kdy1 kdy1 modified the milestones: Planned, v1.12.11 Jul 8, 2025
kdy1 pushed a commit that referenced this pull request Jul 11, 2025
@swc-project swc-project locked as resolved and limited conversation to collaborators Aug 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants