Skip to content

Deprecate datafusion.execution.parquet.max_statistics_size config option #14172

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

The most recent version of arrow deprecates the max_statistics_size parquet option

You can see the rationale from @etseidl here (it was being silently ignored)

DataFusion also exposes this setting:
https://datafusion.apache.org/user-guide/configs.html

datafusion.execution.parquet.max_statistics_size 4096 (writing) Sets max statistics size for any column. If NULL, uses default parquet writer setting

We should deprecate it as well prior to removal

Describe the solution you'd like

  1. Mark the option deprecated in comments
  2. Make the field deprecated in the code

The config option is defined in two places:
https://github.com/apache/datafusion/blob/04b6d4d6099f537de91e1b30a391bdfbc3ec36d5/datafusion/common/src/config.rs#L1723-L1722
https://github.com/apache/datafusion/blob/04b6d4d6099f537de91e1b30a391bdfbc3ec36d5/datafusion/common/src/config.rs#L1723-L1722

But I think they are done via macro

Describe alternatives you've considered

We can just wait until the field is removed upstream in arrow and yank it from datafusion too

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions