parquet writer does not encode null count = 0
correctly #6256
Description
Describe the bug
TLDR is that the Rust parquet writer does not write out null_counts
correctly in row group statistics. However, the reader has the same mistake so systems that both write and read with Rust are unaffected
While reviewing #6216 from @Michael-J-Ward (🙏 ) I am pretty sure the arrow-rs parquet writer does not do the correct thing with respect to null statistics
Specifically, when there are no nulls in the data, the writer does not emit a value for null_count in the thrift metadata (it writes the equivalent of None
) -- it should instead write the equivalent of Some(0)
This will not cause issues for people using parquet-rs to read and write data as the reader also (incorrectly) reports Some(0)
when the thrift metadata has None
To Reproduce
TBD
Expected behavior
- When writing statistics for data without nulls, the parquet-rs writer should write
Some(0)
- When reading statistics, the parquet-rs reader should read
None
Additional context