[EPIC] A collection of support for metadata columns in ListingTable

Other systems  support "metadata" columns when querying datasources. These metadata columns do not exist in the underlying data source but instead are related to the source

Common examples:
* Row number
* File name
* Row Group Number (parquet)

## DataBricks / Spark

It appears DataBricks / spark represents this concept as a struct column `_metadata` column with multiple fields
https://docs.databricks.com/aws/en/ingestion/file-metadata-column

```sql
SELECT
    *
    ,_metadata
    ,_metadata.file_path
    ,_metadata.file_name
    ,_metadata.file_modification_time
FROM
    json.`/path/to/table/data`
```

It looks like maybe spark/databricks used to support the `input_file_name()` function, but has moved to `_metadata`: https://pawankumarshukla1979.medium.com/tips-use-metadata-instead-of-input-file-name-function-in-databricks-runtime-10-5-and-above-b32766b0296b


## DuckDB
DuckDB seems to model this as additional parameters to the `read_parquet` function, specifically `file_name` and `file_row_number`:
https://duckdb.org/docs/stable/data/parquet/overview#parameters

Per @adriangb [last year](https://github.com/apache/datafusion/issues/15173#issuecomment-2858707706):
```
D select filename, sum(row_count) as row_count from read_parquet('/Users/adriangb/Downloads/data2/**/*_stats.parquet', filename=true) group by filename order by row_count desc limit 10;
Binder Error:
Option filename adds column "filename", but a column with this name is also in the file. Try setting a different name: filename='<filename column name>'
```


Related tickets for adding similar metadata features to DataFusion:
- [ ] https://github.com/apache/datafusion/issues/13975
- [ ] https://github.com/apache/datafusion/issues/15173
- [ ] https://github.com/apache/datafusion/issues/6051
- [ ] https://github.com/apache/datafusion/issues/20132
- [ ] https://github.com/apache/datafusion/issues/13261
- [ ] https://github.com/apache/datafusion/issues/18482

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPIC] A collection of support for metadata columns in ListingTable #20135

DataBricks / Spark

DuckDB

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[EPIC] A collection of support for metadata columns in ListingTable #20135

Description

DataBricks / Spark

DuckDB

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions