This repository has been archived by the owner on Feb 18, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 221
Incorrect nullability inferred for nested parquet schema #1556
Comments
Sample file: eventlog.zip, generated with the java implementation from The
|
2 tasks
jhorstmann
added a commit
to jhorstmann/arrow2
that referenced
this issue
Sep 7, 2023
…to arrow This allows the `parquet_read` example to correctly read the nested data attached to issue jorgecarleitao#1556 and also makes several test assertions match the comments above.
Fixed by #1565 |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I'm trying to read a parquet file that contains a struct inside a list using pola-rs and am getting null values for each element. I think I can track down the issue to the schema conversion from parquet to arrow.
The
parquet_to_arrow_schema
function tries to set thenullable
flag ofField
according to the parquet repetition levels. That flag is then used via theInitNested
enum
to calculate the level at which data is valid.My message schema looks like the following:
And I would expect all fields having the
is_nullable
flag set tofalse
. Instead thearray
field is marked as nullable. I think the issue can also be shown with the example schemas from parquet-format/LogicalTypes.md which are tested intest_parquet_lists
. The comments there do not match the assertions. For example:According to the comment and documentation
element
should not be nullable in both examples.I do not yet have a standalone test case and example file, but will try to provide one later.
The text was updated successfully, but these errors were encountered: