Skip to content

Implement partitioned read in listing table provider #1139

@rdettai

Description

@rdettai

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It is usual to organize data files by partitions. There are many ways to do that, but hive partitioning is the most common:

/table_path/customer=1/year=2020/file001.parquet
...
/table_path/customer=1/year=2020/file009.parquet
/table_path/customer=2/year=2020/filexxx.parquet
/table_path/customer=1/year=2021/filexxx.parquet
/table_path/customer=3/year=2021/filexxx.parquet

Describe the solution you'd like
In the ListingTableProvider, when resolving the list of files:

  • their path should be parsed. The PartitionedFile will contain the value of all of the partition dimensions.
  • files that belong to partitions that can be excluded by the filter should be ignored

Additional context
Closing #133 and #204 in favor of this.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions