Skip to content

Conversation

@friendlymatthew
Copy link
Contributor

@friendlymatthew friendlymatthew commented Nov 20, 2025

Rationale for this change

This PR introduces UnionFields::try_new and UnionFields::from_fields as replacements for the "unsafe" UnionFields::new constructor, which is now deprecated

Previously, UnionFields could be constructed with invalid invariants including negative type ids, duplicate type ids, or duplicate fields. Since UnionArrays index by type id, negative values cause panics at runtimes and duplicates break comparisons.

UnionFields::try_new validates that type ids are non-negative, unique, under 128, and that fields are unique, returning an error if any constraint is violated.

UnionFields::from_fields is a convenience constructor that auto-assigns sequential type ids starting from 0. (Will panic if it exceeds 128 elements)

Note about reviewing

I broke it up into smaller commits. The last commit updates all call sites across the different kernels

@friendlymatthew
Copy link
Contributor Author

cc @alamb

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @friendlymatthew and I am sorry for the delay in review

My only concern about this PR is the potentially new restriction to disallow repeated fields (I worry this could cause regressions for people downstream).

Otherwise this one is 👨‍🍳 👌

let fields = match union.typeIds() {
None => UnionFields::new(0_i8..fields.len() as i8, fields),
Some(ids) => UnionFields::new(ids.iter().map(|i| i as i8), fields),
None => UnionFields::from_fields(fields),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice API 👍

///
/// This function returns an error if:
/// - Any type_id appears more than once (duplicate type ids)
/// - Any field appears more than once (duplicate fields)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the arrow spec requires unique field names for union. I couldn't find any reference to such a restriction in https://arrow.apache.org/docs/format/Columnar.html#union-layout

Since I think previous versions of arrow-rs allow duplicate fields I don't think it is ok to start erroring on them in this release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that makes sense to me. I suspect this will cause some subtleties in the way we compare Union types, but I'll make sure to document that in a follow up PR

);
}

#[test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice tests

@kylebarron
Copy link
Member

UnionFields::from_fields is a convenience constructor that auto-assigns sequential type ids starting from 0. (Will panic if it exceeds 128 elements)

Perhaps we should error instead of panicking for >128 elements?

@friendlymatthew
Copy link
Contributor Author

UnionFields::from_fields is a convenience constructor that auto-assigns sequential type ids starting from 0. (Will panic if it exceeds 128 elements)

Perhaps we should error instead of panicking for >128 elements?

Hi, that makes sense to me. I do really like from_fields being infallible for the cases when I hard code my UnionFields.

What if we include a fallible version, something like UnionFields::try_from_fields and inform users on which method to use

@kylebarron
Copy link
Member

I think having from_fields and try_from_fields makes sense 👍

@alamb
Copy link
Contributor

alamb commented Dec 5, 2025

So I think this PR is waiting on:

  1. Don't error with duplicated field names
  2. from_fields / try_from_fields

@friendlymatthew
Copy link
Contributor Author

So I think this PR is waiting on:

  1. Don't error with duplicated field names
  2. from_fields / try_from_fields

Hi, I am back from vacation and will resume work on (union data type support, variant work, blog post)

@friendlymatthew friendlymatthew force-pushed the friendlymatthew/unionfields_try_new branch from 8c161ec to 3499adc Compare December 5, 2025 17:56
@friendlymatthew friendlymatthew force-pushed the friendlymatthew/unionfields_try_new branch from 3499adc to d678756 Compare December 5, 2025 18:02
Comment on lines +501 to +505
pub fn try_from_fields<F>(fields: F) -> Result<Self, ArrowError>
where
F: IntoIterator,
F::Item: Into<FieldRef>,
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +431 to +436
/// # Panics
///
/// Panics if the number of fields exceeds 127 (the maximum value for i8 type IDs).
///
/// If you want to avoid panics, use [`UnionFields::try_from_fields`] instead, which
/// returns a `Result`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's a note about the fallible/infallible ways to construct UnionFields

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate arrow-flight Changes to the arrow-flight crate parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants