Skip to content

parquet writer does not encode null count = 0 correctly  #6256

Open
@alamb

Description

Describe the bug
TLDR is that the Rust parquet writer does not write out null_counts correctly in row group statistics. However, the reader has the same mistake so systems that both write and read with Rust are unaffected

While reviewing #6216 from @Michael-J-Ward (🙏 ) I am pretty sure the arrow-rs parquet writer does not do the correct thing with respect to null statistics

Specifically, when there are no nulls in the data, the writer does not emit a value for null_count in the thrift metadata (it writes the equivalent of None) -- it should instead write the equivalent of Some(0)

This will not cause issues for people using parquet-rs to read and write data as the reader also (incorrectly) reports Some(0) when the thrift metadata has None

To Reproduce
TBD

Expected behavior

  • When writing statistics for data without nulls, the parquet-rs writer should write Some(0)
  • When reading statistics, the parquet-rs reader should read None

Additional context

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions