-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add validated constructors for UnionFields #8891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add validated constructors for UnionFields #8891
Conversation
|
cc @alamb |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @friendlymatthew and I am sorry for the delay in review
My only concern about this PR is the potentially new restriction to disallow repeated fields (I worry this could cause regressions for people downstream).
Otherwise this one is 👨🍳 👌
| let fields = match union.typeIds() { | ||
| None => UnionFields::new(0_i8..fields.len() as i8, fields), | ||
| Some(ids) => UnionFields::new(ids.iter().map(|i| i as i8), fields), | ||
| None => UnionFields::from_fields(fields), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a nice API 👍
arrow-schema/src/fields.rs
Outdated
| /// | ||
| /// This function returns an error if: | ||
| /// - Any type_id appears more than once (duplicate type ids) | ||
| /// - Any field appears more than once (duplicate fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the arrow spec requires unique field names for union. I couldn't find any reference to such a restriction in https://arrow.apache.org/docs/format/Columnar.html#union-layout
Since I think previous versions of arrow-rs allow duplicate fields I don't think it is ok to start erroring on them in this release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that makes sense to me. I suspect this will cause some subtleties in the way we compare Union types, but I'll make sure to document that in a follow up PR
| ); | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice tests
Perhaps we should error instead of panicking for >128 elements? |
Hi, that makes sense to me. I do really like What if we include a fallible version, something like |
|
I think having |
|
So I think this PR is waiting on:
|
Hi, I am back from vacation and will resume work on (union data type support, variant work, blog post) |
8c161ec to
3499adc
Compare
3499adc to
d678756
Compare
| pub fn try_from_fields<F>(fields: F) -> Result<Self, ArrowError> | ||
| where | ||
| F: IntoIterator, | ||
| F::Item: Into<FieldRef>, | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @kylebarron
| /// # Panics | ||
| /// | ||
| /// Panics if the number of fields exceeds 127 (the maximum value for i8 type IDs). | ||
| /// | ||
| /// If you want to avoid panics, use [`UnionFields::try_from_fields`] instead, which | ||
| /// returns a `Result`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's a note about the fallible/infallible ways to construct UnionFields
Rationale for this change
This PR introduces
UnionFields::try_newandUnionFields::from_fieldsas replacements for the "unsafe"UnionFields::newconstructor, which is now deprecatedPreviously,
UnionFieldscould be constructed with invalid invariants including negative type ids, duplicate type ids, or duplicate fields. SinceUnionArraysindex by type id, negative values cause panics at runtimes and duplicates break comparisons.UnionFields::try_newvalidates that type ids are non-negative, unique, under 128, and that fields are unique, returning an error if any constraint is violated.UnionFields::from_fieldsis a convenience constructor that auto-assigns sequential type ids starting from 0. (Will panic if it exceeds 128 elements)Note about reviewing
I broke it up into smaller commits. The last commit updates all call sites across the different kernels