Skip to content

Provide a way to enable source level statistics for tables registered in the CLI #3774

@isidentical

Description

@isidentical

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
#1347 enabled collection of statistics by default on the ListingOptions constructor, though the tables created with CREATE EXTERNAL TABLE can't still use this feature since they are created manually.
https://github.com/apache/arrow-datafusion/blob/e54110fb592e03704da5f6ebd832b8fe1c51123b/datafusion/core/src/execution/context.rs#L486-L488

Describe the solution you'd like
We already have a per file extension listing option implementation for the read_ dataframe APIs (e.g. CsvReadOptions, ParquetReadOptions) and they have sane defaults (like collect_stats is false for CSV and true for Parquet). I wonder whether we can just use them here and obtain the ListingOptions directly from them.

Describe alternatives you've considered
Leaving as is, or enabling them globally (instead of refactoring that part to use ReadOptions) by just setting the flag to true.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions