Skip to content

enhancement (datafusion-cli): Add support for glob patterns in CREATE EXTERNAL TABLE commands #16387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

a-agmon
Copy link

@a-agmon a-agmon commented Jun 12, 2025

Partly closes #16303

The purpose of this PR is to enable using CREATE command with glob pattern and a URL scheme - i.e.,

CREATE EXTERNAL TABLE ee3 
STORED AS CSV 
LOCATION 's3://tests/data/file-*-1.csv';

CREATE EXTERNAL TABLE pp 
STORED AS PARQUET 
LOCATION 's3://tests/data-p/te*';

Its currently possible to create an external table using this syntax just for local files:

CREATE EXTERNAL TABLE aa
STORED AS CSV 
LOCATION '/Users/aa/projects/tmdb/tmdb_*.csv';

Therefore, the purpose here is to enable support for glob support also for remote url scheme.

The implementation involves some sort of workaround - it intercepts create_plan(), and when the table involves a glob pattern and remote scheme then it creates it as a ListingTable. Part of the reason for this approach is the fact that DataFusion core modules use ListingTable::parse() method in its core modules, which only takes a glob pattern when it invovles local files (see /datafusion/core/src/datasource/listing_table_factory.rs for example).

@a-agmon a-agmon changed the title feat: Add support for glob patterns in CREATE EXTERNAL TABLE commands enhancement (datafusion-cli): Add support for glob patterns in CREATE EXTERNAL TABLE commands Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support reading multiple parquet files via datafusion-cli
1 participant