-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to Apache Avro 1.10.1 #1648
Conversation
7f13e86
to
effe75b
Compare
43b8d97
to
775b3dc
Compare
@Fokko I am also interested in getting Iceberg onto Avro 1.10.1. Are you still working on this? |
@shardulm94 I'll pick this up again since I'll be convenient for #1972, this will make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good to me. A couple of minor comments.
@Fokko I see the Spark tests are failing with this error
This happens because Jackson expects I think we may need to bump Iceberg's jackson version in @rdblue I am not a 100% sure if this is the right thing to do. I know Spark can be finicky about the Jackson version. Do you have any insights here? |
@shardulm94, our Jackson version should be okay to update. We shade and relocate it in the runtime Jars to avoid conflicts with Spark. |
Not 100% sure if this is the case. The Jackson module in Avro isn't shaded, so that might cause issues with Spark 2. Avro 1.10.1 uses Jackson 2.11, and 1.9 is still on 2.10. Spark 3.0.1 is still on Jackson 2.10:
https://github.com/apache/spark/blob/v3.0.1/pom.xml#L175 Pulling a newer version into the classpath will create a mismatch between the Jackson versions. They've bumped Jackson in Spark 3.1, for exactly the issue that we're running into: apache/spark@01b73ae |
264abc7
to
dacb650
Compare
@Fokko, could you rebase this? Now that the Jackson update is in, I think we can work on it. |
@rdblue Sure thing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! @Fokko Just confirming, is the PR is ready to merge?
Assert.assertNotNull("Should project location", projected.get("location")); | ||
Assert.assertNull("Should not project longitude", projectedLocation.get("long")); | ||
AssertHelpers.assertEmptyAvroField(projected, "long"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should still use projectedLocation
. Looks like it might just be a typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, probably a copy-paste!
@@ -169,12 +164,14 @@ public String getLogicalTypeName() { | |||
|
|||
@Override | |||
public BigDecimal fromFixed(GenericFixed value, Schema schema, LogicalType type) { | |||
return super.fromFixed(value, schema, decimalsByScale[((ParquetDecimal) type).scale()]); | |||
final ParquetDecimal dec = (ParquetDecimal) type; | |||
return super.fromFixed(value, schema, LogicalTypes.decimal(dec.precision(), dec.scale())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change results in creating a new Decimal
for every value. While I don't think that we use this very much, I would rather not introduce object allocation for every conversion. Can you add a static cache for these? That was the purpose of decimalsByScale
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but we didn't take the scale into account, so it could actually change the scale. I fixed this in the fromFixed
by directly creating the BigDecimal
, this is what's happening in the Avro library:
@Override
public BigDecimal fromFixed(GenericFixed value, Schema schema, LogicalType type) {
int scale = ((LogicalTypes.Decimal) type).getScale();
return new BigDecimal(new BigInteger(value.bytes()), scale);
}
For the toFixed
this isn't that trivial, since this would mean copying a lot of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to delegate to toFixed
. I just don't want to call LogicalTypes.decimal
every time to create a new decimal LogicalType
with the same scale and precision. I think just adding a cache keyed by a Pair
of scale and precision would fix it.
@Fokko, looks correct overall, although there is one error in a test case. I'd also like to fix the allocation problem the the fixed converter. |
04f5901
to
a0328ac
Compare
Looks like this would fix #1654 so we should get it into the 0.12.0 release. I don't think we should update Avro with a breaking change in a patch release so it isn't a good idea to try to get this into 0.11.1. @Fokko, if you have time please update to 1.10.2 and I'll merge. Otherwise, I'll merge this in a day or so and we can update again. Thanks! |
Yes, I'll fix it right away. |
5cbe6ca
to
f931bbf
Compare
b98e1a7
to
30db7ec
Compare
I think a Jackson dependency update in Avro breaks Spark:
Looks like Avro updated Jackson from 2.11.3 to 2.12.2, which is what causes the problem. We should revert the latest update to 1.10.2 and use 1.10.1 to fix the problem, I guess. |
Is there any workaround for this issue? Are you going to release 0.11.2 with this fix? |
@davseitsev, we don't typically change dependency versions in patch releases. I think that's a surprise to people that depend on very few changes in patches. This would be a good one to get into 0.12.0 (release in the next couple weeks) if we can get tests passing. |
Merged! Thanks for updating this, @Fokko! |
Co-authored-by: Fokko Driesprong <fdriesprong@ebay.com>
Moving the Apache Avro 1.10.0. But with two changes:
GenericRecord.get(...)
would return null before <= 1.10.0, but now it will give an exception. When setting a field that doesn't exist, it will give an exception, but this wasn't the case when retrieving a field. When you would have a typo in field name, this would silently return a null. More information in https://issues.apache.org/jira/browse/AVRO-2278The current conversion only takes scale into account, so it could be that precision would be lost.
Personally, I like the more explicit messaging when it comes to non-existent fields and loss of precision. But this is a matter of taste.