Skip to content

Conversation

@friendlymatthew
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

shred_variant currently panics when attempting to shred Variants containing values of the FixedSizeBinary(16) data type

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Oct 20, 2025
Comment on lines +413 to +416
// // inspect the typed_value Field and make sure it contains the canonical Uuid extension type
// let typed_value_field = variant_array
// .inner()
// .fields()
// .into_iter()
// .find(|f| f.name() == "typed_value")
// .unwrap();

// assert!(
// typed_value_field
// .try_extension_type::<extension::Uuid>()
// .is_ok()
// );

// probe the downcasted typed_value array to make sure uuids are shredded correctly
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get this assertion to pass, significant changes would be needed to correctly plumb the FieldRef through the code.

Since checking for extension types primarily improves correctness, doesn't block the initial impl, and would add considerable noise to this PR, I plan to address it in a follow-up

Comment on lines +286 to +306
DataType::FixedSizeBinary(16) => {
Uuid(VariantToUuidArrowRowBuilder::new(cast_options, capacity))
}
DataType::FixedSizeBinary(size) => {
return Err(ArrowError::InvalidArgumentError(format!(
"FixedSizeBinary({size}) is not a valid variant shredding type. Only FixedSizeBinary(16) for UUID is supported."
)));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the 2 arms I added in this PR

Sorry about the weird format diff. I suspect @scovich will have thoughts on how to improve/fix this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird quick of fmt that I keep running into -- it seems to flip unpredictably back and forth between two forms:

let builder = match data_type {

vs

let builder = 
    match data_type {

IMO it's a bug in fmt. I think it's intentional because changes in indentation can impact whether certain things fit on one line vs. several? For example, indenting further can ironically reduce line counts because of "fun" interactions with the rule that every arg goes on its own line, if they don't all fit on the same line as the open+close parens.

Case in point -- this PR actually has more lines, because de-denting the four DecimalXX match arms caused them to put every arg on its own line. Even tho de-denting also saved a couple lines for the Boolean match arm.

Anyway, I usually just review with whitespace ignored when I see this...

Copy link
Contributor Author

@friendlymatthew friendlymatthew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@friendlymatthew friendlymatthew force-pushed the friendlymatthew/support-shred-uuid-to-arrow branch 2 times, most recently from e8a30b7 to 9ef30a2 Compare October 20, 2025 18:24
Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! (a few minor questions that probably don't block the PR)

Comment on lines +286 to +306
DataType::FixedSizeBinary(16) => {
Uuid(VariantToUuidArrowRowBuilder::new(cast_options, capacity))
}
DataType::FixedSizeBinary(size) => {
return Err(ArrowError::InvalidArgumentError(format!(
"FixedSizeBinary({size}) is not a valid variant shredding type. Only FixedSizeBinary(16) for UUID is supported."
)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird quick of fmt that I keep running into -- it seems to flip unpredictably back and forth between two forms:

let builder = match data_type {

vs

let builder = 
    match data_type {

IMO it's a bug in fmt. I think it's intentional because changes in indentation can impact whether certain things fit on one line vs. several? For example, indenting further can ironically reduce line counts because of "fun" interactions with the rule that every arg goes on its own line, if they don't all fit on the same line as the open+close parens.

Case in point -- this PR actually has more lines, because de-denting the four DecimalXX match arms caused them to put every arg on its own line. Even tho de-denting also saved a couple lines for the Boolean match arm.

Anyway, I usually just review with whitespace ignored when I see this...

}
DataType::FixedSizeBinary(size) => {
return Err(ArrowError::InvalidArgumentError(format!(
"FixedSizeBinary({size}) is not a valid variant shredding type. Only FixedSizeBinary(16) for UUID is supported."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking -- we have reliable ways of identifying Decimal128 and Decimal256 (they have proper arrow data types), so we don't need to ever interpret raw FixedSizeBinary as decimal, even tho it is a valid physical representation for those larger decimal types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they have their own dedicated paths so when we enter the FixedSizeBinary arm, we can safely assume the value isn't a Decimal

@friendlymatthew friendlymatthew force-pushed the friendlymatthew/support-shred-uuid-to-arrow branch from 484fa2e to f7f5e11 Compare October 24, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Support Uuid/FixedSizeBinary(16) shredding

2 participants