Skip to content

Support datafusion-cli access to public S3 buckets that do not require authentication #16299

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Some S3 public buckets, such as the clickbench public datasets bucket, do not require authentication

Other engines like ClickBench allow you to access these without providing any credentials: https://clickhouse.com/docs/engines/table-engines/integrations/s3

CREATE TABLE s3_engine_table (name String, value UInt32)
    ENGINE=S3('s3://clickhouse-public-datasets/hits_compatible/hits.parquet', 'CSV', 'gzip')

However, datafusion-cli requires you to provide credentials in this case:

datafusion-cli
DataFusion CLI v47.0.0
> CREATE EXTERNAL TABLE hits
STORED AS PARQUET LOCATION 's3://clickhouse-public-datasets/hits_compatible/hits.parquet' OPTIONS(aws.region 'eu-west-1');
Object Store error: Generic S3 error: the credential provider was not enabled

Describe the solution you'd like

I would like the ability to access the public datasets without providing credentials

This is supported via this setting in the underlying builder: https://docs.rs/object_store/0.12.0/object_store/aws/struct.AmazonS3Builder.html#method.with_skip_signature

Describe alternatives you've considered

I would like to be able to do

> CREATE EXTERNAL TABLE hits
STORED AS PARQUET LOCATION 's3://clickhouse-public-datasets/hits_compatible/hits.parquet' OPTIONS(aws.skip_signature true, aws.region 'eu-central-1');

And maybe also this (without any signature at all)

> CREATE EXTERNAL TABLE hits
STORED AS PARQUET LOCATION 's3://clickhouse-public-datasets/hits_compatible/hits.parquet' OPTIONS(aws.region 'eu-central-1');

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions