Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ExtensionType for uuid and map to parquet logical type #5822

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mbrobbel
Copy link
Contributor

@mbrobbel mbrobbel commented May 31, 2024

Rationale for this change

It would be nice to better support reading and writing the canonical uuid extension type with the arrow and parquet crate i.e. mapping between the arrow extension type and the parquet logical uuid type.

What changes are included in this PR?

This adds an ExtensionType trait, some impls for canonical extension types and CanonicalExtensionTypes enum for canonical extension types.

Are there any user-facing changes?

Users can now annotate their logical types with extension types, and for uuid they are propagated via the arrow writer to map to the parquet uuid logical type.

This needs better tests and better docs, but I'd like to get some feedback on the approach first, because there are many different ways to implement this.

I quickly tested this change with narrow and those uuid fields (in the parquet file) are now picked up as uuid instead of blob by DuckDB.

@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels May 31, 2024
@kylebarron
Copy link
Contributor

Maybe ExtensionType could be a trait to be externally implementable and not limited to canonical extension types?

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me, and seems like an unobtrusive way to provide better ergonomics for extension types.

That being said I've limited exposure to them so getting some broader perspectives might be valuable, perhaps on the mailing list or something?

@mbrobbel
Copy link
Contributor Author

I haven't had time to work on this, but I'm planning to pick this up later.

@alamb
Copy link
Contributor

alamb commented Jul 1, 2024

Thanks @mbrobbel -- marking this PR as draft as I think it still has planned but not yet completed work

@alamb alamb marked this pull request as draft July 1, 2024 19:17
@aykut-bozkurt
Copy link
Contributor

Thanks for the PR. This seems very useful to support not yet mapped logical types. e.g. json

@mbrobbel
Copy link
Contributor Author

mbrobbel commented Sep 26, 2024

I updated the PR to define a trait for extension types instead. Ready for another round of feedback.
Edit: just realized I need to change some trait methods to make it work for other extension types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants