-
Notifications
You must be signed in to change notification settings - Fork 1k
[Variant] Support shred_variant for Uuids
#8666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Variant] Support shred_variant for Uuids
#8666
Conversation
| // // inspect the typed_value Field and make sure it contains the canonical Uuid extension type | ||
| // let typed_value_field = variant_array | ||
| // .inner() | ||
| // .fields() | ||
| // .into_iter() | ||
| // .find(|f| f.name() == "typed_value") | ||
| // .unwrap(); | ||
|
|
||
| // assert!( | ||
| // typed_value_field | ||
| // .try_extension_type::<extension::Uuid>() | ||
| // .is_ok() | ||
| // ); | ||
|
|
||
| // probe the downcasted typed_value array to make sure uuids are shredded correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get this assertion to pass, significant changes would be needed to correctly plumb the FieldRef through the code.
Since checking for extension types primarily improves correctness, doesn't block the initial impl, and would add considerable noise to this PR, I plan to address it in a follow-up
| DataType::FixedSizeBinary(16) => { | ||
| Uuid(VariantToUuidArrowRowBuilder::new(cast_options, capacity)) | ||
| } | ||
| DataType::FixedSizeBinary(size) => { | ||
| return Err(ArrowError::InvalidArgumentError(format!( | ||
| "FixedSizeBinary({size}) is not a valid variant shredding type. Only FixedSizeBinary(16) for UUID is supported." | ||
| ))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the 2 arms I added in this PR
Sorry about the weird format diff. I suspect @scovich will have thoughts on how to improve/fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird quick of fmt that I keep running into -- it seems to flip unpredictably back and forth between two forms:
let builder = match data_type {vs
let builder =
match data_type {IMO it's a bug in fmt. I think it's intentional because changes in indentation can impact whether certain things fit on one line vs. several? For example, indenting further can ironically reduce line counts because of "fun" interactions with the rule that every arg goes on its own line, if they don't all fit on the same line as the open+close parens.
Case in point -- this PR actually has more lines, because de-denting the four DecimalXX match arms caused them to put every arg on its own line. Even tho de-denting also saved a couple lines for the Boolean match arm.
Anyway, I usually just review with whitespace ignored when I see this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e8a30b7 to
9ef30a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it! (a few minor questions that probably don't block the PR)
| DataType::FixedSizeBinary(16) => { | ||
| Uuid(VariantToUuidArrowRowBuilder::new(cast_options, capacity)) | ||
| } | ||
| DataType::FixedSizeBinary(size) => { | ||
| return Err(ArrowError::InvalidArgumentError(format!( | ||
| "FixedSizeBinary({size}) is not a valid variant shredding type. Only FixedSizeBinary(16) for UUID is supported." | ||
| ))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a weird quick of fmt that I keep running into -- it seems to flip unpredictably back and forth between two forms:
let builder = match data_type {vs
let builder =
match data_type {IMO it's a bug in fmt. I think it's intentional because changes in indentation can impact whether certain things fit on one line vs. several? For example, indenting further can ironically reduce line counts because of "fun" interactions with the rule that every arg goes on its own line, if they don't all fit on the same line as the open+close parens.
Case in point -- this PR actually has more lines, because de-denting the four DecimalXX match arms caused them to put every arg on its own line. Even tho de-denting also saved a couple lines for the Boolean match arm.
Anyway, I usually just review with whitespace ignored when I see this...
| } | ||
| DataType::FixedSizeBinary(size) => { | ||
| return Err(ArrowError::InvalidArgumentError(format!( | ||
| "FixedSizeBinary({size}) is not a valid variant shredding type. Only FixedSizeBinary(16) for UUID is supported." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking -- we have reliable ways of identifying Decimal128 and Decimal256 (they have proper arrow data types), so we don't need to ever interpret raw FixedSizeBinary as decimal, even tho it is a valid physical representation for those larger decimal types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they have their own dedicated paths so when we enter the FixedSizeBinary arm, we can safely assume the value isn't a Decimal
484fa2e to
f7f5e11
Compare
Which issue does this PR close?
FixedSizeBinary(16)shredding #8665Rationale for this change
shred_variantcurrently panics when attempting to shred Variants containing values of the FixedSizeBinary(16) data type