Add arrow-avro support for Duration type and minor fixes for UUID decoding#7889
Merged
alamb merged 6 commits intoapache:mainfrom Jul 14, 2025
Merged
Add arrow-avro support for Duration type and minor fixes for UUID decoding#7889alamb merged 6 commits intoapache:mainfrom
alamb merged 6 commits intoapache:mainfrom
Conversation
- Fixed `Uuid` support, now represented as `Utf8` in Arrow and added testing logic. - Added `Duration` support, mapped to Arrow's `IntervalMonthDayNano`, with schema handling, decoding, and integration tests. - Updated `Cargo.toml` to include the `uuid` crate as a dev dependency for UUID checking. - Added integration tests with the new `duration_uuid.avro` test file.
749d435 to
e2faf46
Compare
mbrobbel
reviewed
Jul 10, 2025
Co-authored-by: Matthijs Brobbel <m1brobbel@gmail.com>
mbrobbel
reviewed
Jul 10, 2025
- Changed `Uuid` from `Utf8` back to `FixedSizeBinary(16)` for proper Arrow UUID representation. - Removed `uuid` crate dependency. - Updated schema handling, decoding logic, and relevant tests for the new `Uuid` type. - Added utility functions and tests to parse UUID strings into binary format.
mbrobbel
reviewed
Jul 11, 2025
Co-authored-by: Matthijs Brobbel <m1brobbel@gmail.com>
mbrobbel
approved these changes
Jul 11, 2025
Member
mbrobbel
left a comment
There was a problem hiding this comment.
Thanks @jecsand838. I think it would be good to get at least one more review, because I'm not familiar with this crate.
- Introduced `canonical_extension_types` feature for standardized UUID handling. - Added `Uuid` crate dependency for parsing and validating UUIDs. - Updated `field_with_name` method to support canonical UUID representation. - Removed custom UUID parsing logic and replaced it with `Uuid` crate functionality. - Updated `Cargo.toml` accordingly.
66dd25a to
aa00f95
Compare
Contributor
Author
@mbrobbel Thank you for the solid review and great suggestions. @alamb @scovich Would either of you be able to provide the additional review(s) if you get a chance? |
f21d1ec to
5c56183
Compare
alamb
approved these changes
Jul 14, 2025
Contributor
alamb
left a comment
There was a problem hiding this comment.
Looks good to me - thanks @jecsand838 and @mbrobbel
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of Add Avro Support #4886
Related to Avro codec enhancements #6965
Rationale for this change
The
arrow-avrocrate currently lacks support for the Avrodurationtype, which is a standard and commonly used type in Avro schemas. This omission prevents users from reading Avro files containing duration types, limiting the crate's utility.This change introduces support for decoding Avro duration types by mapping them to the Arrow
Intervaltype. This is a logical and efficient representation. Implementing this feature brings thearrow-avrocrate closer to full Avro specification compliance and makes it more robust for real-world use cases.What changes are included in this PR?
This PR contains:
utf8type to better align with the Avro specificationduration_uuid.avrofile created using this python script: https://gist.github.com/jecsand838/cbdaaf581af78f357778bf87d2f3cf15Are these changes tested?
Yes, this PR includes for integration and unit tests covering these modifications.
Are there any user-facing changes?
N/A
Follow-Up PRs
test_duration_uuidonce Added duration_uuid.avro file arrow-testing#108 is merged in.