-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: Move to ValueReader generation to a visitor #9063
Conversation
} | ||
|
||
@Override | ||
public Optional<ParquetValueReader<?>> visit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this case UTF8: return new ParquetValueReaders.StringReader(desc);
being covered here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the old UTF8
OriginalType
, maps to a StringType
: https://github.com/apache/parquet-mr/blob/65bc51846010360f3dd4304103ec3c637776d7c9/parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java#L197-L198
@aokolnychyi do you have a spare cycle for reviewing this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's just a bug in the visit(timestampLogicalType) case, but everything else looks great to me. This does lead me to wonder though, do the parquet reader tests test all these data types?
I'll double check that. We don't need to need to couple any of those in this PR though.
return tsMicrosType.shouldAdjustToUTC() | ||
? Optional.of(new TimestamptzReader(desc)) | ||
: Optional.of(new TimestampReader(desc)); | ||
} else if (timestampLogicalType.getUnit() == LogicalTypeAnnotation.TimeUnit.MICROS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be == LogicalTypeAnnotation.TimeUnit.MILLIS? Not Micros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, that's a good one! I would this part to be thoroughly tested as well.
Thanks @jacobmarble, @nk1506 & @amogh-jahagirdar for the review 🙌 |
* Parquet: Move to the visitor * Add one more edge case * Thanks Amogh!
* Parquet: Move to the visitor * Add one more edge case * Thanks Amogh!
* Parquet: Move to the visitor * Add one more edge case * Thanks Amogh!
This will replace the switch statement that relies on the deprecated
getOriginalType()
with a visitor pattern of the newLogicalTypeAnnotation
. OriginalType does not include support for Nanosecond precision timestamps.