-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rust] Parquet data source does not support complex types #83
Comments
Comment from Wes McKinney(wesm) @ 2019-03-14T22:28:39.984+0000: This is a fairly tricky task (we still don't have this fully done in C++). I'm moving to 0.14 as I expect it to take a little time Comment from Neville Dipale(nevi_me) @ 2020-11-28T12:41:50.557+0000: [~andygrove] I'm going through old PRs and closing them. The writer will support nested types to our heart's content, would we need to do anything further to enable this in DataFusion, or can we close this? Comment from Andy Grove(andygrove) @ 2020-11-28T17:42:28.962+0000: Thanks [~nevi_me] I filed https://issues.apache.org/jira/browse/ARROW-10761 for the work we need to do in DataFusion Comment from Andrew Lamb(alamb) @ 2021-04-26T11:23:22.697+0000: Migrated to github: https://github.com/apache/arrow-rs/issues/39 |
Started hacking here https://github.com/Igosuki/arrow-datafusion/tree/map_access |
Is it expected that datafusion cannot currently read parquets with nested objects at all, even if we never utilize the column? While attempting to read a parquet that has a nested object, I get an error because Should the ability to read parquets with nested objects be implemented (only panicking if transformations utilizing that field), or would it be better to just work on this issue as a whole? |
This seems like a valuable addition to me (allowing queries on parquet files that had nested objects but were not read)
Well of course, supporting queries on the data would be better than just not crashing/erroring when they weren't read :) I think the choice of approach is probably best determined by whoever implements this feature |
perhaps one of the remaining items would be supported nested columns in |
The indexed map access code will work on the plan so the only thing the parquet reader has to do is simply deserialize nested structures recursively. @houqp I see that support in parquet2 was added (have not tested the arrow2 branch yet) jorgecarleitao/parquet2#64 so it's only a matter of adding it to the reader. |
I believe this issue can now be closed, as of apache/arrow-rs#2500 parquet has full support for arbitrarily nested types. Feel free to reopen if I have missed something |
I believe it still cannot process everything. I was reading a parquet file through I looked it up further, and found the code originating from within |
Thank you for the report @ShraddhaKishan -- would it be possible to file a ticket in https://github.com/apache/arrow-rs with a reproducer (or at least the parquet file that can not be read)? |
Sure thing. |
Note: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-4863
Once ARROW-4466 is merged, I would like to add support for reading parquet files that contain LIST and STRUCT.
The text was updated successfully, but these errors were encountered: