`datafusion-cli`: Use correct S3 region if it is not specified

### Is your feature request related to a problem or challenge?

- Part of https://github.com/apache/datafusion/issues/13456

I would like to make it easy to use datafusion-cli to query files on S3 as possible

For example, after https://github.com/apache/datafusion/issues/16299 is merged I would like to be able to read from the ClickBench example datasets:

```sql
CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 's3://clickhouse-public-datasets/hits_compatible/athena_partitioned/hits_1.parquet';
```

However, when I run this I get the following error:

```sql
> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 's3://clickhouse-public-datasets/hits_compatible/athena_partitioned/hits_1.parquet';
Object Store error: Generic S3 error: Error performing HEAD https://s3.us-east-1.amazonaws.com/clickhouse-public-datasets/hits_compatible/athena_partitioned/hits_1.parquet in 499.73175ms - Received redirect without LOCATION, this normally indicates an incorrectly configured region
```


This does give me the hint that the region is incorrectly configured which is good, however, it doesn't tell me "WHAT" region I need


If I provide the correct region (`eu-central-1`) it works great:
```sql
> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 's3://clickhouse-public-datasets/hits_compatible/athena_partitioned/hits_1.parquet' OPTIONS ('aws.region' 'eu-central-1');
0 row(s) fetched.
Elapsed 1.182 seconds.

> select count(*) from hits;
+----------+
| count(*) |
+----------+
| 1000000  |
+----------+
1 row(s) fetched.
Elapsed 0.780 seconds.
```

I noticed that that DuckDB and ClickHouse do not require the region to be set:

```sql
v1.2.2 7c039464e4
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D select count(*) from read_parquet('s3://clickhouse-public-datasets/hits_compatible/athena_partitioned/hits_1.parquet');
┌────────────────┐
│  count_star()  │
│     int64      │
├────────────────┤
│    1000000     │
│ (1.00 million) │
└────────────────┘
```

### Describe the solution you'd like

I would like `datafusion-cli` to automatically find the region as well

I did some investigation and the correct region is returned via a response header, which you can see via

```shell
curl -v -X HEAD https://s3.us-east-1.amazonaws.com/clickhouse-public-datasets/hits_compatible/athena_partitioned/hits_1.parquet
...
...
> HEAD /clickhouse-public-datasets/hits_compatible/athena_partitioned/hits_1.parquet HTTP/1.1
> Host: s3.us-east-1.amazonaws.com
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 301 Moved Permanently
< x-amz-bucket-region: eu-central-1
< x-amz-request-id: Q44G0APVQH5JHHC4
< x-amz-id-2: cubLiiba/Q138g5SbNNlSoGtARMxobuq7GhA+3t39il+Wj50HNPBUh4bOGVS2Bwlc6k4f0lp6r0=
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Fri, 06 Jun 2025 14:19:57 GMT
< Server: AmazonS3
```

Note the `x-amz-bucket-region` in the response:
```
< x-amz-bucket-region: eu-central-1
```

I suspect this will need some change upstream in the object_store crate and I will work on filing an upstream ticket now

### Describe alternatives you've considered

_No response_

### Additional context

Upstream ticket
- https://github.com/apache/arrow-rs-object-store/issues/402

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`datafusion-cli`: Use correct S3 region if it is not specified #16306

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

datafusion-cli: Use correct S3 region if it is not specified #16306

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`datafusion-cli`: Use correct S3 region if it is not specified #16306