-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: missing required field ColumnIndex.null_pages
when loading page indexes
#6464
Comments
@etseidl predicted this error in https://github.com/apache/arrow-rs/pull/6081/files#r1774020124
|
missing required field ColumnIndex.null_pages
when missing required field ColumnIndex.null_pages
when loading page indexes
The issue with this test is that the second let mut reader = ParquetMetaDataReader::new().with_page_indexes(true);
let err = reader.try_parse_sized(&metadata_bytes, parquet_bytes.len()).err().unwrap();
assert_eq!(err.to_string(), "Index 386 out of bound: 468"); It's also possible to split the requests between data sources. let mut reader = ParquetMetaDataReader::new();
// get metadata from serialized bytes
reader.try_parse(&metadata_bytes).unwrap();
// get page indexes from original file
reader.read_page_indexes(&parquet_bytes).unwrap();
let roundtrip_metadata = reader.finish().unwrap();
assert_eq!(original_metadata, roundtrip_metadata); So part of this issue is a documentation thing. We probably have to be clearer about the intended use of the read and write APIs and explain the pitfalls. That said, I do think an option to have |
I agree -- I would even argue that the default should be for the writer not to write page index offsets if there are no page indexes in memory (in fact, maybe the rust structs shouldn't even have the offsets 🤔 )
I think that Specifically I would expect this to fail
The rationale is that in this case something is incorrect / mismatched between the serialized metadata and what is provided to the reader. |
Added check that will return EOF error for this case. #6507 |
Describe the bug
If the
ParquetMetadataReader
tries to read metadata written byParquetMetaDataWriter
without first loading the page indexes, you get an error like "missing required field ColumnIndex.null_pages"Nite this depends on #6463
To Reproduce
The full reproducer is in #6463. Here is the relevant piece
Expected behavior
The reader should not error
I am not sure if the right fix is to
Additional context
@etseidl has added the APIs in #6431
The text was updated successfully, but these errors were encountered: