-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-44769: [C++][Parquet] Fix read/write of metadata length footer on big-endian systems #44787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
By converting the `uint32_t` to little endian before casting to a `uint8_t*`, this is always correct in the output file.
|
|
mapleFU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General LGTM!
cpp/src/parquet/file_writer.cc
Outdated
| metadata_len = ::arrow::bit_util::ToLittleEndian(metadata_len); | ||
| PARQUET_THROW_NOT_OK(sink->Write(reinterpret_cast<uint8_t*>(&metadata_len), 4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| metadata_len = ::arrow::bit_util::ToLittleEndian(metadata_len); | |
| PARQUET_THROW_NOT_OK(sink->Write(reinterpret_cast<uint8_t*>(&metadata_len), 4)); | |
| { | |
| uint32_t metadata_len_le = ::arrow::bit_util::ToLittleEndian(metadata_len); | |
| PARQUET_THROW_NOT_OK(sink->Write(reinterpret_cast<uint8_t*>(&metadata_len_le), 4)); | |
| } |
Can we solve like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
cpp/src/parquet/file_writer.cc
Outdated
| PARQUET_ASSIGN_OR_THROW(position, sink->Tell()); | ||
| metadata_len = static_cast<uint32_t>(position) - metadata_len; | ||
|
|
||
| metadata_len = ::arrow::bit_util::ToLittleEndian(metadata_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also same suggestion as above.
mapleFU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge this tonight
|
Rerun ci, would merge if it passes |
|
Thanks all, merged! |
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 9015a81. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Rationale for this change
See issue.
What changes are included in this PR?
Fix writing Parquet metadata length footer
By converting the
uint32_tto little endian before casting to auint8_t*, this is always correct in the output file.Fix reading Parquet metadata length footer
Are these changes tested?
Yes.
Are there any user-facing changes?
Reading a Parquet file won't complain about metadata size in the footer, though that doesn't guarantee anything else will work yet.