Skip to content

Add function that converts from parquet statistics ParquetStatistics to arrow arrays ArrayRef #4328

@sundy-li

Description

@sundy-li

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Describe the solution you'd like

Add util function to convert from ParquetStatistics to ArrayRef

Describe alternatives you've considered

arrow-datafusion has a util trait PruningStatistics that converts RowGroupPruningStatistics into ArrayRef used to prune the blocks.

https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/physical_plan/file_format/parquet/row_groups.rs#L229

https://github.com/apache/arrow-datafusion/blob/b8f90fe9366a7406afbf5bb3f3afe5854adcf26a/datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs#L103-L228

But the util function like get_min_max_values will convert the statisticsinto datafusion's ScalarValue and convert it back into ArrayRef which seems very redundant because it could be done without datafusion.

So I suggest that arrow-rs could support this trait like arrow2 did

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions