Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing (column) metadata (when reading Parquet files) #1

Open
evetion opened this issue Nov 6, 2024 · 2 comments · May be fixed by #2
Open

Exposing (column) metadata (when reading Parquet files) #1

evetion opened this issue Nov 6, 2024 · 2 comments · May be fixed by #2

Comments

@evetion
Copy link

evetion commented Nov 6, 2024

Is it possible to expose metadata present in Parquet files read with QuackIO? Ideally it can be attached to objects (e.g. set on a DataFrame via DataAPI metadata) that support metadata.

julia> df = QuackIO.read_parquet(DataFrame, "some.parquet");
julia> metadata(df)
Dict{Any, Any}()  # <-- expect metadata
@evetion
Copy link
Author

evetion commented Nov 6, 2024

Something like this would work

# if resulting object supports metadata 
qstr = """select * from parquet_kv_metadata($(kwarg_val_to_db_incomma(fn)))"""
results = DBInterface.execute(DuckDB.DB(), qstr)
for (key, value) in results
	String(key) == "ARROW:schema" && continue  # ignore non-string valued internal metadata
   	metadata!(obj, String(key), value; style=:note)
end

@aplavin
Copy link
Member

aplavin commented Nov 6, 2024

Agree that would make sense! A PR to add metadata support is definitely welcome :) I'm not that familiar with parquet.

@evetion evetion linked a pull request Nov 7, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants