-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet does not support wasm32-unknown-unknown target #180
Comments
Comment from Dominik Moritz(domoritz) @ 2021-02-12T21:59:13.949+0000: If lz4 is the issue, maybe we could switch to https://github.com/PSeitz/lz4_flex, which compiles to WASM. Comment from Andy Redhead(AndyRedhead1974) @ 2021-03-04T21:51:18.454+0000: A WebAssembly compatible Rust library that can read data from Parquet files would be very useful to anyone who would like to do "browser based" data processing/visualisation (better still if that library is in a family that includes efficient in-memory "data structures"). Comment from David Roher(droher) @ 2021-04-12T02:03:38.512+0000: I just got a version of DataFusion working on wasm32-unknown-unknown – it required disabling both the LZ4 and ZSTD features on Parquet and tweaking the hash function: [https://github.com/apache/arrow/compare/master...droher:master] To add to [~AndyRedhead1974]'s point above, it would also be useful in a serverless context – for instance, Cloudflare Workers Unbound is in beta now and will allow WASM functions to run at unlimited CPU usage. in this context, DataFusion could be a serverless data lake engine like AWS Athena. Maybe it could even be useful as a Ballista worker. Comment from Dominik Moritz(domoritz) @ 2021-04-12T04:38:44.388+0000: That's awesome. Do you want to add a note to https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11615, which tracks DataFusion support for wasm? |
Just wondering if there's anything actionable here or if it's open ended without a clear solution. I'm just starting to learn rust but would love for this to work in wasm. |
I think someone needs to go through the dependencies and replace them with ones that work in wasm. I think this is pretty doable. |
In the original issue you mentioned
Testing locally it appears that both the |
I had a play with getting a minimalist apache (v1) arrow & parquet to compile to wasm32-unknown-unknown back in July/August (2021) when the released version of arrow was ~5.0. It worked ok, I vaguely remember having to do something around adding an annotation on the hashcode for one of the structs to point rustc at an implementation that worked in wasm (I'm very new to Rust and definitely still in the "muddling through" phase"). My tinkering has been on the "back burner" since September (for a number of reasons, not least the hard disk on my personal laptop dying), I got chance over the Christmas break just gone to recover what I can from the dead disk and get started again :) The apache arrow-rs v6.5.0 crate builds without any modifications :) I've put the (very basic) results of my tinkering into a git repo. Its based on the "Rust and Webassembly" example project, uses the parquet crate to read a parquet file and the javascript arrow library to read values out of the result. I recently stumbled across a reference to the arrow2/parquet2 projects, the design goals seem sensible but I haven't had chance to look at them yet. |
Yeah -- we now test to ensure that Arrow builds on wasm as part of all the CI runs: https://github.com/apache/arrow-rs/runs/4685124037?check_suite_focus=true |
I made another effort on top of @andyredhead 's helpful repo to create a minimal JS parser from Parquet to Arrow. So far it seems to work with Snappy and uncompressed Parquet files, though the generated Arrow IPC files seem to be occasionally malformatted (errors of |
Another update on the compression codecs:
In terms of Arrow IPC files being malformatted, I switched from |
Thank you for the update @kylebarron |
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-11593
The Arrow crate successfully compiles to WebAssembly (e.g. https://github.com/domoritz/arrow-wasm) but the Parquet crate currently does not support the
wasm32-unknown-unknown
target.Try out the repository at domoritz/parquet-wasm@e877f9a. The problem seems to be in liblz4, even if I do not include lz4 in the feature flags.
The text was updated successfully, but these errors were encountered: