Move to Apache Avro 1.10.1 #1648

Fokko · 2020-10-22T19:26:26Z

Moving the Apache Avro 1.10.0. But with two changes:

The GenericRecord.get(...) would return null before <= 1.10.0, but now it will give an exception. When setting a field that doesn't exist, it will give an exception, but this wasn't the case when retrieving a field. When you would have a typo in field name, this would silently return a null. More information in https://issues.apache.org/jira/browse/AVRO-2278
The validation of downcasting decimals was made more strict. Therefore with the Parquet to Avro Decimal conversion, there was an implicit downcast from 34 to 10:

org.apache.iceberg.spark.data.parquet.vectorized.TestParquetVectorizedReads > testMostlyNullsForOptionalFields FAILED
    org.apache.avro.AvroTypeException: Cannot encode decimal with precision 14 as max precision 10
        at org.apache.avro.Conversions$DecimalConversion.validate(Conversions.java:141)
        at org.apache.avro.Conversions$DecimalConversion.toFixed(Conversions.java:105)
        at org.apache.iceberg.parquet.ParquetAvro$FixedDecimalConversion.toFixed(ParquetAvro.java:177)
        at org.apache.iceberg.parquet.ParquetAvro$FixedDecimalConversion.toFixed(ParquetAvro.java:156)
        at org.apache.parquet.avro.AvroWriteSupport.convert(AvroWriteSupport.java:293)
        at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:276)
        at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
        at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
        at org.apache.iceberg.parquet.ParquetWriteSupport.write(ParquetWriteSupport.java:62)
        at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
        at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:301)
        at org.apache.iceberg.parquet.ParquetWriteAdapter.add(ParquetWriteAdapter.java:45)
        at org.apache.iceberg.io.FileAppender.addAll(FileAppender.java:32)
        at org.apache.iceberg.io.FileAppender.addAll(FileAppender.java:37)
        at org.apache.iceberg.spark.data.parquet.vectorized.TestParquetVectorizedReads.writeAndValidate(TestParquetVectorizedReads.java:74)
        at org.apache.iceberg.spark.data.parquet.vectorized.TestParquetVectorizedReads.testMostlyNullsForOptionalFields(TestParquetVectorizedReads.java:175)

The current conversion only takes scale into account, so it could be that precision would be lost.

Personally, I like the more explicit messaging when it comes to non-existent fields and loss of precision. But this is a matter of taste.

shardulm94 · 2020-12-18T18:57:29Z

@Fokko I am also interested in getting Iceberg onto Avro 1.10.1. Are you still working on this?

api/src/test/java/org/apache/iceberg/AssertHelpers.java

Fokko · 2021-01-08T12:09:06Z

@shardulm94 I'll pick this up again since I'll be convenient for #1972, this will make Schema serializable.

shardulm94

Mostly looks good to me. A couple of minor comments.

api/src/test/java/org/apache/iceberg/AssertHelpers.java

versions.props

shardulm94 · 2021-01-13T01:06:33Z

@Fokko I see the Spark tests are failing with this error

    java.lang.ExceptionInInitializerError

        Caused by:
        com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.2 requires Jackson Databind version >= 2.10.0 and < 2.11.0

This happens because Jackson expects com.fasterxml.jackson.core versions to match com.fasterxml.jackson.module and that doesn't seem to be the case here. I ran a gradle build scan and seems like jackson-databind version is being upgraded to 2.11 because of Avro 1.10.0 which is causing this issue.
Gradle build scan link

I think we may need to bump Iceberg's jackson version in version.props and in build.gradle

@rdblue I am not a 100% sure if this is the right thing to do. I know Spark can be finicky about the Jackson version. Do you have any insights here?

rdblue · 2021-01-13T01:17:34Z

@shardulm94, our Jackson version should be okay to update. We shade and relocate it in the runtime Jars to avoid conflicts with Spark.

Fokko · 2021-01-13T08:55:38Z

Not 100% sure if this is the case. The Jackson module in Avro isn't shaded, so that might cause issues with Spark 2. Avro 1.10.1 uses Jackson 2.11, and 1.9 is still on 2.10.

Spark 3.0.1 is still on Jackson 2.10:

+--- org.apache.spark:spark-hive_2.12 -> 3.0.1
|    +--- org.apache.spark:spark-core_2.12:3.0.1
|    |    +--- com.fasterxml.jackson.module:jackson-module-scala_2.12:2.10.0 -> 2.10.2
|    |    |    +--- org.scala-lang:scala-library:2.12.10
|    |    |    +--- com.fasterxml.jackson.core:jackson-core:2.10.2
|    |    |    +--- com.fasterxml.jackson.core:jackson-annotations:2.10.2
|    |    |    +--- com.fasterxml.jackson.core:jackson-databind:2.10.2 (*)
|    |    |    \--- com.fasterxml.jackson.module:jackson-module-paranamer:2.10.2
|    |    |         +--- com.fasterxml.jackson.core:jackson-databind:2.10.2 (*)
|    |    |         \--- com.thoughtworks.paranamer:paranamer:2.8

https://github.com/apache/spark/blob/v3.0.1/pom.xml#L175

Pulling a newer version into the classpath will create a mismatch between the Jackson versions.

They've bumped Jackson in Spark 3.1, for exactly the issue that we're running into: apache/spark@01b73ae

rdblue · 2021-02-01T22:27:30Z

@Fokko, could you rebase this? Now that the Jackson update is in, I think we can work on it.

Fokko · 2021-02-02T07:41:16Z

@rdblue Sure thing!

api/src/test/java/org/apache/iceberg/AssertHelpers.java

parquet/src/test/java/org/apache/iceberg/TestHelpers.java

shardulm94

This looks good to me! @Fokko Just confirming, is the PR is ready to merge?

rdblue · 2021-02-21T23:40:47Z

core/src/test/java/org/apache/iceberg/avro/TestReadProjection.java

    Assert.assertNotNull("Should project location", projected.get("location"));
-    Assert.assertNull("Should not project longitude", projectedLocation.get("long"));
+    AssertHelpers.assertEmptyAvroField(projected, "long");


I think this should still use projectedLocation. Looks like it might just be a typo.

Good catch, probably a copy-paste!

rdblue · 2021-02-21T23:46:28Z

parquet/src/main/java/org/apache/iceberg/parquet/ParquetAvro.java

@@ -169,12 +164,14 @@ public String getLogicalTypeName() {

    @Override
    public BigDecimal fromFixed(GenericFixed value, Schema schema, LogicalType type) {
-      return super.fromFixed(value, schema, decimalsByScale[((ParquetDecimal) type).scale()]);
+      final ParquetDecimal dec = (ParquetDecimal) type;
+      return super.fromFixed(value, schema, LogicalTypes.decimal(dec.precision(), dec.scale()));


This change results in creating a new Decimal for every value. While I don't think that we use this very much, I would rather not introduce object allocation for every conversion. Can you add a static cache for these? That was the purpose of decimalsByScale.

I know, but we didn't take the scale into account, so it could actually change the scale. I fixed this in the fromFixed by directly creating the BigDecimal, this is what's happening in the Avro library:

@Override public BigDecimal fromFixed(GenericFixed value, Schema schema, LogicalType type) { int scale = ((LogicalTypes.Decimal) type).getScale(); return new BigDecimal(new BigInteger(value.bytes()), scale); }

For the toFixed this isn't that trivial, since this would mean copying a lot of code.

I think it's fine to delegate to toFixed. I just don't want to call LogicalTypes.decimal every time to create a new decimal LogicalType with the same scale and precision. I think just adding a cache keyed by a Pair of scale and precision would fix it.

rdblue · 2021-02-21T23:48:13Z

@Fokko, looks correct overall, although there is one error in a test case. I'd also like to fix the allocation problem the the fixed converter.

versions.props

rdblue · 2021-03-23T20:29:48Z

Looks like this would fix #1654 so we should get it into the 0.12.0 release. I don't think we should update Avro with a breaking change in a patch release so it isn't a good idea to try to get this into 0.11.1.

@Fokko, if you have time please update to 1.10.2 and I'll merge. Otherwise, I'll merge this in a day or so and we can update again. Thanks!

Fokko · 2021-03-23T20:43:52Z

Yes, I'll fix it right away.

rdblue · 2021-03-23T21:41:28Z

I think a Jackson dependency update in Avro breaks Spark:

com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.11.4 requires Jackson Databind version >= 2.11.0 and < 2.12.0

Looks like Avro updated Jackson from 2.11.3 to 2.12.2, which is what causes the problem. We should revert the latest update to 1.10.2 and use 1.10.1 to fix the problem, I guess.

…ro-1-10

davseitsev · 2021-06-25T08:07:40Z

Is there any workaround for this issue? Are you going to release 0.11.2 with this fix?

rdblue · 2021-07-07T00:14:53Z

@davseitsev, we don't typically change dependency versions in patch releases. I think that's a surprise to people that depend on very few changes in patches. This would be a good one to get into 0.12.0 (release in the next couple weeks) if we can get tests passing.

rdblue · 2021-07-13T15:59:34Z

Merged! Thanks for updating this, @Fokko!

Co-authored-by: Fokko Driesprong <fdriesprong@ebay.com>

Fokko force-pushed the fd-move-to-avro-1-10 branch from 7f13e86 to effe75b Compare October 22, 2020 19:34

This was referenced Oct 22, 2020

Bump Spark 2 branch to 2.4.7 #1646

Merged

Fix small issues #1650

Merged

Fokko force-pushed the fd-move-to-avro-1-10 branch 2 times, most recently from 43b8d97 to 775b3dc Compare October 25, 2020 12:45

github-actions bot added API build core parquet labels Nov 25, 2020

massdosage reviewed Jan 4, 2021

View reviewed changes

api/src/test/java/org/apache/iceberg/AssertHelpers.java Outdated Show resolved Hide resolved

shardulm94 reviewed Jan 12, 2021

View reviewed changes

api/src/test/java/org/apache/iceberg/AssertHelpers.java Outdated Show resolved Hide resolved

versions.props Outdated Show resolved Hide resolved

Fokko mentioned this pull request Jan 13, 2021

Bump Jackson to 2.11.4 #2084

Merged

Fokko changed the title ~~Move to Apache Avro 1.10.0~~ Move to Apache Avro 1.10.1 Jan 13, 2021

Fokko force-pushed the fd-move-to-avro-1-10 branch 3 times, most recently from 264abc7 to dacb650 Compare January 18, 2021 17:04

massdosage reviewed Feb 2, 2021

View reviewed changes

api/src/test/java/org/apache/iceberg/AssertHelpers.java Outdated Show resolved Hide resolved

parquet/src/test/java/org/apache/iceberg/TestHelpers.java Outdated Show resolved Hide resolved

shardulm94 approved these changes Feb 19, 2021

View reviewed changes

rdblue reviewed Feb 21, 2021

View reviewed changes

Fokko force-pushed the fd-move-to-avro-1-10 branch from 04f5901 to a0328ac Compare March 11, 2021 08:38

dongjoon-hyun reviewed Mar 22, 2021

View reviewed changes

versions.props Show resolved Hide resolved

Fokko and others added 8 commits March 23, 2021 21:42

Move to Apache Avro 1.10.1

9029e12

Make the stylechecker happy

5838e70

Fix the conversions

bc1f033

Remove accidental search/replace

bd9757f

Bump to 1.10.1

bf8439b

Update docstring

919aa92

Address comments

0141184

Bump to Avro 1.10.2

f931bbf

Fokko force-pushed the fd-move-to-avro-1-10 branch from 5cbe6ca to f931bbf Compare March 23, 2021 20:45

Add cache

30db7ec

Fokko force-pushed the fd-move-to-avro-1-10 branch from b98e1a7 to 30db7ec Compare March 23, 2021 21:07

rdblue approved these changes Mar 23, 2021

View reviewed changes

Merge branch 'master' of github.com:apache/iceberg into fd-move-to-av…

737135b

…ro-1-10

Merge branch 'master' into fd-move-to-avro-1-10

e470084

Fokko changed the title ~~Move to Apache Avro 1.10.1~~ Move to Apache Avro 1.10.2 Jul 12, 2021

Fokko Driesprong added 2 commits July 12, 2021 22:44

Cleanup

982c85a

Back to 1.10.1

1405a0c

Fokko changed the title ~~Move to Apache Avro 1.10.2~~ Move to Apache Avro 1.10.1 Jul 13, 2021

rdblue merged commit b3fb81a into apache:master Jul 13, 2021

minchowang pushed a commit to minchowang/iceberg that referenced this pull request Aug 2, 2021

Core: Use Avro 1.10.1 (apache#1648)

ac2f183

Co-authored-by: Fokko Driesprong <fdriesprong@ebay.com>

hankfanchiu mentioned this pull request Aug 16, 2021

Add release notes for 0.12.0 #2973

Merged

phd3 mentioned this pull request Aug 23, 2021

Upgrade Avro version to 1.10.1 trinodb/trino#8943

Closed

hashhar mentioned this pull request Aug 24, 2021

Bump Avro version to 1.10.1 trinodb/trino#8946

Closed

rzhang10 mentioned this pull request Sep 16, 2021

Move to Avro 1.10.2 linkedin/iceberg#82

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to Apache Avro 1.10.1 #1648

Move to Apache Avro 1.10.1 #1648

Fokko commented Oct 22, 2020

shardulm94 commented Dec 18, 2020

Fokko commented Jan 8, 2021

shardulm94 left a comment

shardulm94 commented Jan 13, 2021

rdblue commented Jan 13, 2021

Fokko commented Jan 13, 2021 •

edited

Loading

rdblue commented Feb 1, 2021

Fokko commented Feb 2, 2021

shardulm94 left a comment

rdblue Feb 21, 2021

Fokko Mar 11, 2021

rdblue Feb 21, 2021

Fokko Mar 11, 2021

rdblue Mar 15, 2021

rdblue commented Feb 21, 2021

rdblue commented Mar 23, 2021 •

edited

Loading

Fokko commented Mar 23, 2021

rdblue commented Mar 23, 2021

davseitsev commented Jun 25, 2021

rdblue commented Jul 7, 2021

rdblue commented Jul 13, 2021

Move to Apache Avro 1.10.1 #1648

Move to Apache Avro 1.10.1 #1648

Conversation

Fokko commented Oct 22, 2020

shardulm94 commented Dec 18, 2020

Fokko commented Jan 8, 2021

shardulm94 left a comment

Choose a reason for hiding this comment

shardulm94 commented Jan 13, 2021

rdblue commented Jan 13, 2021

Fokko commented Jan 13, 2021 • edited Loading

rdblue commented Feb 1, 2021

Fokko commented Feb 2, 2021

shardulm94 left a comment

Choose a reason for hiding this comment

rdblue Feb 21, 2021

Choose a reason for hiding this comment

Fokko Mar 11, 2021

Choose a reason for hiding this comment

rdblue Feb 21, 2021

Choose a reason for hiding this comment

Fokko Mar 11, 2021

Choose a reason for hiding this comment

rdblue Mar 15, 2021

Choose a reason for hiding this comment

rdblue commented Feb 21, 2021

rdblue commented Mar 23, 2021 • edited Loading

Fokko commented Mar 23, 2021

rdblue commented Mar 23, 2021

davseitsev commented Jun 25, 2021

rdblue commented Jul 7, 2021

rdblue commented Jul 13, 2021

Fokko commented Jan 13, 2021 •

edited

Loading

rdblue commented Mar 23, 2021 •

edited

Loading