-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
EPICA larger project, actively underway, with sub tasksA larger project, actively underway, with sub tasksenhancementNew feature or requestNew feature or request
Description
This is a list of improvements we are working on in ListingTable in DataFusion
Background
DataFusion has a ListingTable that effectively reading tables stored in one or more files in a "hive partitioned" directory structure:
So for example, give files like this:
/path/to/my_table/file1.parquet
/path/to/my_table/file2.parquet
/path/to/my_table/file3.parquet
You can create a table with a command like
CREATE EXTERNAL TABLE my_table
LOCATION '/path/to/my_table'And the ListingTable will handle figuring out schema, and running queries against those files as though they were a single table.
Team
- @BlakeOrth
- @alamb (maintainer)
Bugs
- Bug:
ListingTableFactoryfails to read data when the final path element contains a.#17212 - Auto detecting partitions with
ListingTableFactoryon Hive partitioned datasets #17049
Enhancements
- [Epic] Enable parquet metadata cache by default #17000
- Improved experience when remote object store URL does not end in
/(retry as paths) #16302 - [
datafusion-cli] Add a way to see what object store requests are made #17207 - Make
datafusion-cliobject store tracing mode work for local files #18119 - Reduce number of object store requests when reading parquet files by default (set
metadata_size_hint) #18118 - [datafusion-cli] Implement average LIST duration for object store profiling #18138
- Partitioned object store lists all files on every query when using hive-partitioned parquet files #9654
- Enable the
ListFilesCacheto be available for partitioned tables #17211
jonathanc-n and tlm365
Metadata
Metadata
Assignees
Labels
EPICA larger project, actively underway, with sub tasksA larger project, actively underway, with sub tasksenhancementNew feature or requestNew feature or request