Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module splitting and type section duplication #1530

Open
tlively opened this issue Sep 18, 2024 · 0 comments
Open

Module splitting and type section duplication #1530

tlively opened this issue Sep 18, 2024 · 0 comments

Comments

@tlively
Copy link
Member

tlively commented Sep 18, 2024

Hello! I've been looking into more robust module splitting solutions, particularly for WasmGC modules. Here are the results of an experiment I did where I split calcworker_wasm.wasm from Google Sheets into 216 modules. Functions were automatically assigned to modules based on their original build targets rather than what would make sense to actually serve, but this should suffice to get an idea of the overheads involved.

First, here's how overall code size was affected, along with a breakdown by section of where the change came from.

module file size type code global import export
original calcworker_wasm.wasm 3702274 84324 2507784 508017 275383 210
cumulative split modules 5430287 1063595 2597670 506189 587786 277331
percent increase 46.67% 1161.32% 3.58% -0.36% 113.44% 131962.38%
share of blame   56.67% 5.20% -0.11% 18.08% 16.04%

We can probably reduce the code size due to imports and exports a bit more by moving more than just functions into the secondary modules. For example, if a secondary module is the only one that uses a particular imported global, it could just import the global directly. Today, the primary module imports and re-exports the global, then the secondary module imports it from the primary module.

But there's nothing we can do today to improve the code size of the type sections! The types are already arranged into minimal recursion groups and included only in modules where they are necessary for validation. Here's a breakdown of how many types each module uses either directly or indirectly. A directly used type is one that is in a rec group with a type that is directly allocated, accessed, cast, or otherwise referenced from the code. All other types are indirectly used, and are necessary to include only because they appear somewhere in the expanded definition of a directly used type.

module included types directly used types percent used
original calcworker_wasm.wasm 5692 5686 99.89%
cumulative split modules 56825 31181 54.87%
multiplicative factor 9.98 5.48 0.55

On average, each type appears in about 10 modules, but is only directly used in 5.5 of them.

I'm interested in hearing what ideas folks have about how we could reduce the overhead of duplicated type sections. The best case would be that we could directly use the full type section from the primary module in each of the secondary modules without having to download it again. Another solution might look more like compile-time type imports that are able to abstract away the unused types, but that would still require repeating the used types. Either way, I don't have all the details worked out. Are there other or more complete ideas out there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant