Skip to content

Conversation

@brancz
Copy link
Contributor

@brancz brancz commented Nov 19, 2025

Which issue does this PR close?

Closes #8870.

What changes are included in this PR?

Check if field order in from/to casting matches, and if not, attempt to find the fields by name.

Are these changes tested?

Added unit tests (that previously failed, so I separated them in a commit).

Are there any user-facing changes?

No, it's strictly additive functionality.

@alamb @vegarsti

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 19, 2025
@brancz brancz force-pushed the struct-casting-field-order branch from 959a210 to 808c285 Compare November 19, 2025 09:55
Copy link
Contributor

@vegarsti vegarsti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This looks good to me. Nice feature improvement.

@@ -221,12 +221,34 @@ pub fn can_cast_types(from_type: &DataType, to_type: &DataType) -> bool {
Decimal32(_, _) | Decimal64(_, _) | Decimal128(_, _) | Decimal256(_, _),
) => true,
(Struct(from_fields), Struct(to_fields)) => {
from_fields.len() == to_fields.len()
&& from_fields.iter().zip(to_fields.iter()).all(|(f1, f2)| {
// fast path, all field names are in the same order and same number of fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be on line 228, I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, good catch the order was a bit different at the start when I wrote that comment haha

@brancz brancz force-pushed the struct-casting-field-order branch from 808c285 to 001f740 Compare November 19, 2025 10:08
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @brancz -- my only concern are the changes to the existing tests. Otherwise it looks good to me

});
}

// slow path, we match the fields by name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one idea that has come up in the past is to do this mapping calculation once and then use it for both can_cast_types and cast

However, this seems to be strictly better than current main (doesn't slow down existing code and allows more uses, so 👍 to me)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also going back and forth, but I decided that an additional allocation in the average case would be far worse in perf cost than comparisons. We're heavily profiling all these code paths, if it's significant, we will come back and improve the perf! 😄

let struct_array = StructArray::from(vec![
(
Arc::new(Field::new("b", DataType::Boolean, false)),
Arc::new(Field::new("a", DataType::Boolean, false)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out these tests were actually wrong to begin with, have a look at the names of the columns, how can a/b be cast to b/c? They only ever worked by accident, and now that we test whether they match, they needed to be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would other people find this a regression (aka that they expect struct fields to be treated in order, rather than by name) 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s possible someone had it incorrect with it accidentally working but I don’t think that should stop fixing what’s clearly a bug. The previous behavior can actually cause very hidden unexpected behavior when it does accidentally work but field names mismatch or have a different order. I can’t see a valid use case for the incorrect previous behavior and all valid behaviors can be represented still and on total mismatches a user will even get a proper error now instead of potentially silently continuing.

let struct_array = StructArray::from(vec![
(
Arc::new(Field::new("b", DataType::Boolean, false)),
Arc::new(Field::new("a", DataType::Boolean, false)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise why was this test changed?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution @brancz

@alamb alamb merged commit 7e637a7 into apache:main Nov 25, 2025
26 checks passed
@alamb
Copy link
Contributor

alamb commented Nov 25, 2025

Thanks again @brancz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Struct casting requires same order of fields

3 participants