-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce main module size #1574
Reduce main module size #1574
Conversation
Hi! Thanks a lot for bringing this up, and the exploration you did. I think this is a bit too brittle, but if needed this can be considered as a bridge (potentially we might need a different list for each bundle, since functions might end up having different signatures, plus there are ad hoc functions like the personality function that do are missing if exceptions are not there, see https://github.com/duckdb/duckdb-wasm/actions/runs/7453872285/job/20280188337?pr=1574#step:7:5872). The problem is partially that the functions are exported, but mainly the fact that for each exported functions it's name (as a string) it's repeated multiple times:
that in my versions generates 80000+ lines. Here I think there are few possible solutions:
Main problem of this remapping is that the original names still needs to be stored somewhere. |
As you look at file sizes, please also measure gzip and brotli compressed sizes. We should track that as you make progress. |
Actually I think solution 1 (= iterating on available exports) should be doable, for now I implemented only a prototype but it seems that the whole overhead on JS files can be avoided (adding a post-processing step). Wasm side the exports are still somehow pricey, so maybe both solution can be combined. I have not explored changing optimization level, also that can be done independenlty. |
Thanks for the feedback! Agree that solution 1 sounds the most promising (and the cleanest one). I actually tried to hack something quickly before the holiday break but couldn't come up with something obvious. I agree that this seems very doable indeed, happy to help on that front. Solution 2 is basically what And finally completely agree with @domoritz that we should also track the gzipped and brotli-compressed size. I actually almost provided the brotli size in the PR description but felt that while important, it is not the main point: these symbols are still an overhead for the browser even if bundle is compressed. I'll edit the description with them though :-) @carlopi, let me know how I can best help you with solution 1. |
I found a solution to the size problem, I will open a PR soon (EDIT: it's here #1589), basics are that JavaScript size we don't need to expose any C++ side functions (apart for the few duckdb_web ones and some utilities listed as EXPORTED_FUNCTIONS). I have not find a polished way to do this, butstripping them via This solutions allows to reduce the size of the uncompressed duckdb-eh.js file to 2.0M, while the wasm files will not be modified (since the overhead on listing the exports will still be there). The general problem is that Emscripten seems not to have (or at least, not one that I have found) a way to specify that there are two level of interfaces: symbols that needs to be externally available at the C++ side to other modules to be dlopened, and symbols that needs to be exposed to the JavaScript interface (for functions that needs to be there on the JS side). I hope a solution can be implemented properly in Emscripten. This PR is still interesting in the blue-print to select explicitly which functions are to be exported (C++ side), that allows to reduce the weight of the wasm side, so it might make sense to have both since they are independent, I will come back to this after merging the JS-side PR. |
Closing since #1589 is a much better approach. |
Note: This PR is more a PoC to start a conversation: it is not in a mergeable state.
Today, when building DuckDB WASM with the "loadable extension" flag, the bundles are quite big because when building with
MAIN_MODULE=1
, Emscripten exports all absolutely the symbols:This PR is an attempt to reduce these. Here are the size when built with this PR:
I was able to load all the in-tree DuckDB extensions, as well as ours with this build.
Of course, this is not that easy, here are the caveats:
O2
toO3
cf. Issue with O3-optimized main module and side module using emval emscripten-core/emscripten#21036exported_functions
file) is more akin to a magic incantation than a robust solutionHappy to hear your thoughts and whether it is a road worth pursuing, or if you have better ideas on how to solve this?