Skip to content

HIVE-21737: Upgrade Avro to version 1.10.1 #1635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

iemejia
Copy link
Member

@iemejia iemejia commented Oct 30, 2020

Co-Authored-By: Ismaël Mejía iemejia@gmail.com

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

@iemejia
Copy link
Member Author

iemejia commented Oct 30, 2020

R: @sunchao

@iemejia
Copy link
Member Author

iemejia commented Oct 30, 2020

Just for curiosity, patches in the ticket are not needed anymore?

@sunchao
Copy link
Member

sunchao commented Oct 30, 2020

Thanks @iemejia . I also opened #1621 for the issue.

Just for curiosity, patches in the ticket are not needed anymore?

Yes Hive has switched to use github PR.

@iemejia
Copy link
Member Author

iemejia commented Nov 3, 2020

@sunchao It seems the jenkins tests failed but I cannot get into jenkins to see what's going on, any hints?

@sunchao
Copy link
Member

sunchao commented Nov 3, 2020

@iemejia can you open this link? the errors are related to Avro but it should be good now since #1621 has just been merged. Can you just upgrade the Avro version in this PR and try again? thanks.

@iemejia
Copy link
Member Author

iemejia commented Nov 4, 2020

@sunchao great to see some of the changes merged. I cannot open the link. I got a 404 error now (before it was a sockets error who aborted the access eagerly).

I am not sure if I understand. I upgraded Avro to the latest stable release (1.10.0) but I expect this will trigger the failures previously discussed in the ticket (on ill defined schemas on avro_deserialize_map_null.q, parquet_map_null.q) and fixed by AVRO-2817 but that require of the 1.10.1 release.

Great to see #1621 merged, is there already a PR to cherry-pick that one into branch-2.3? Do you want me to do that?
I suppose however we might require also the changes on TypeInfoToSchema from this PR too to make it also forward compatible because the Field constructor with JsonNode as a parameter is not available after Avro 1.9.

@sunchao
Copy link
Member

sunchao commented Nov 5, 2020

I got a 404 error now (before it was a sockets error who aborted the access eagerly).

It's weird, and I'm not sure how access is managed there. FWIW, here's the error message:

Caused by: java.lang.ClassNotFoundException: org.xerial.snappy.SnappyInputStream
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	... 19 more

so you may need to double check dependencies and perhaps add the missing snappy one.

I am not sure if I understand. I upgraded Avro to the latest stable release (1.10.0) but I expect this will trigger the failures previously discussed in the ticket (on ill defined schemas on avro_deserialize_map_null.q, parquet_map_null.q) and fixed by AVRO-2817 but that require of the 1.10.1 release.

Seems this only triggers when you have "default:null" in schema? I don't see we're doing that in Hive though.

Great to see #1621 merged, is there already a PR to cherry-pick that one into branch-2.3? Do you want me to do that?
I suppose however we might require also the changes on TypeInfoToSchema from this PR too to make it also forward compatible because the Field constructor with JsonNode as a parameter is not available after Avro 1.9.

Not yet, yeah feel free to do that. Have you got a chance to test this against Spark with a upgraded Parquet version?

@iemejia
Copy link
Member Author

iemejia commented Nov 6, 2020

so you may need to double check dependencies and perhaps add the missing snappy one.

Can you please share with me the full logs of the CI run, just to see what modules might require the dependency? Snappy is a optional runtime dependency now, so it is up to the project (Hive) to know where it uses it.

Seems this only triggers when you have "default:null" in schema? I don't see we're doing that in Hive though.

Yes exactly, but this can happen in the context of an union too, both parquet_map_null.q and avro_deserialize_map_null.q refer to map_null_val.avro so this seem to be the case.

Not yet, yeah feel free to do that. Have you got a chance to test this against Spark with a upgraded Parquet version?

I have a branch with everything on it Avro, Hive and Parquet updates in case you want to take a look, it fully compiles however I have not been able to run the full set of Spark tests locally yet because of unrelated errors on Kinesis (and lack of familiarity with the Spark test suite). I would like to test it with the full CI, but to do that I would need to have the Hive SNAPSHOTs somewhere, it is a pity Hive does not publish them for older versions, or are they published somewhere (that way i will update things there first)?

@sunchao
Copy link
Member

sunchao commented Nov 7, 2020

Sure, the stack trace is:

java.lang.NoClassDefFoundError: org/xerial/snappy/SnappyInputStream
	at org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService$TestTaskSchedulerServiceWrapper.<init>(TestLlapTaskSchedulerService.java:2008)
	at org.apache.hadoop.hive.llap.tezplugins.TestLlapTaskSchedulerService.testHostPreferenceMissesConsistentPartialAlive(TestLlapTaskSchedulerService.java:1077)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.xerial.snappy.SnappyInputStream
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	... 14 more

seems all the failures are from org.apache.hadoop.hive.llap.tezplugins module.

Yes exactly, but this can happen in the context of an union too, both parquet_map_null.q and avro_deserialize_map_null.q refer to map_null_val.avro so this seem to be the case.

Ah I see. Great. I guess we haven't seen test failure related to this yet?

I have a branch with everything on it Avro, Hive and Parquet updates in case you want to take a look, it fully compiles however I have not been able to run the full set of Spark tests locally yet because of unrelated errors on Kinesis (and lack of familiarity with the Spark test suite). I would like to test it with the full CI, but to do that I would need to have the Hive SNAPSHOTs somewhere, it is a pity Hive does not publish them for older versions, or are they published somewhere (that way i will update things there first)?

Awesome! do you see any compile error without the Avro API fix? I don't think the snapshot jars are published anywhere so it could be hard to test this in Spark CI... However, we also plan to test this combination out in our internal Spark CI.

@iemejia
Copy link
Member Author

iemejia commented Dec 3, 2020

PR updated to the last version of Avro let's see if this gets us more tests passing now. 🤞 @sunchao @wangyum

@iemejia
Copy link
Member Author

iemejia commented Dec 3, 2020

Do you have an estimate of where a possible vote to get the previous changes (and hopefully this one) released as 2.3.8?
Thinking about Spark could take it in.

@sunchao
Copy link
Member

sunchao commented Dec 22, 2020

@iemejia please let me know if you think this is absolutely needed. I plan to start another vote on Hive 2.3.8 RC3 in the AM of 22th (PST).

@h-vetinari
Copy link

I'm not speaking for @iemejia of course, but my impression was that this patch was necessary for Hive 2.3.8 in order to (among other things) unblock spark from using a newer parquet (and thus avro) version.

See linked issues in HIVE-21737, SPARK-27733, etc.
The interlocking dependencies on this have been a very thorny problem that people have tried working on for a long time (e.g. #674, #785), so it would be really cool if this didn't miss the release.

@sunchao
Copy link
Member

sunchao commented Dec 22, 2020

Yes, but my impression is that HIVE-24324 and HIVE-24436 should be sufficient to solve the issue. But let me know if otherwise.

@h-vetinari
Copy link

I'm just guessing myself, but based on the changes in apache/spark#30517, I think a bumped Avro dependency in Hive 2.3.8 is what people were planning with.

@sunchao
Copy link
Member

sunchao commented Dec 22, 2020

Yes I'm aware of that PR. The Avro upgrade is in Spark though, and the testing is done with previous Hive release candidates which do not include the Avro upgrade.

@iemejia
Copy link
Member Author

iemejia commented Dec 22, 2020

@sunchao Great to know the cut is happening soon!

If this is absolutely needed is a question of tradeoffs. Avro binary format has not changed since version 1.8.x when it introduced Logical Types, but more recent versions have removed dependencies from the public API (Jackson, Guava, JodaTime, etc) so I suppose catching up to the latest version can have the same risks as catching up to 1.8.x (which Hive already did) for the binary part.

I suppose Hive users rarely use Avro directly from the transitive dependency and mostly rely on the Hive APIs (which I hope don't leak Avro) so this might diminish the risk, but of course there is a risk in that particular case. I am probably biased towards the upgrade because I come from the Avro side and I expect the full Big Data ecosystem to be updated and avoid issues because Hive contributors may introduce changes that are API incompatible with more recent versions of Avro and break downstream projects because of this (like the current parallel work on Spark).

As usual in software it is all about tradeoffs. This decision is up to you guys as the maintainers and I might miss some other side effects in my analysis because I don't know Hive deeply.

In any case if you guys decide to jump to the latest Avro version e.g. 1.10.1 and any issue happens I engage myself from the Avro side to do any fix and get out a release if required.

@sunchao
Copy link
Member

sunchao commented Dec 22, 2020

@iemejia thanks, can you open a PR against branch-2.3? let's see how the tests go first.

@iemejia
Copy link
Member Author

iemejia commented Dec 22, 2020

@sunchao Done #1806 🤞

@sunchao
Copy link
Member

sunchao commented Dec 22, 2020

BTW @iemejia I'm seeing this test consistently failing in all the test runs:

Screen Shot 2020-12-22 at 10 36 32 AM

Can you take a look?

@sunchao
Copy link
Member

sunchao commented Dec 22, 2020

You can try to reproduce this via:

mvn clean install -DskipTests
cd itests
mvn clean install -DskipTests
mvn test -Dtest=TestMiniDruidKafkaCliDriver -Dqfile=druidkafkamini_avro.q -Dtest.output.overwrite=true

although I got a different error:

FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. org.skife.jdbi.v2.exceptions.UnableToObtainConnectionException: java.sql.SQLException: Cannot create         PoolableConnectionFactory (java.net.ConnectException : Error connecting to server localhost on port 8,084 with message Connection refused (Connection refused).)

It seems it's caused by this:

Exception in thread "main" java.lang.NoSuchMethodError: com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType;
» at com.google.inject.multibindings.Multibinder.collectionOfProvidersOf(Multibinder.java:202)
» at com.google.inject.multibindings.Multibinder$RealMultibinder.<init>(Multibinder.java:283)
» at com.google.inject.multibindings.Multibinder$RealMultibinder.<init>(Multibinder.java:258)
» at com.google.inject.multibindings.Multibinder.newRealSetBinder(Multibinder.java:178)
» at com.google.inject.multibindings.Multibinder.newSetBinder(Multibinder.java:150)
» at org.apache.druid.guice.LifecycleModule.getEagerBinder(LifecycleModule.java:115)
» at org.apache.druid.guice.LifecycleModule.configure(LifecycleModule.java:121)
» at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
» at com.google.inject.spi.Elements.getElements(Elements.java:110)
» at com.google.inject.util.Modules$OverrideModule.configure(Modules.java:177)
» at com.google.inject.AbstractModule.configure(AbstractModule.java:62)
» at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
» at com.google.inject.spi.Elements.getElements(Elements.java:110)
» at com.google.inject.util.Modules$OverrideModule.configure(Modules.java:177)
» at com.google.inject.AbstractModule.configure(AbstractModule.java:62)
» at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
» at com.google.inject.spi.Elements.getElements(Elements.java:110)
» at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:138)
» at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104)
» at com.google.inject.Guice.createInjector(Guice.java:96)
» at com.google.inject.Guice.createInjector(Guice.java:73)
» at com.google.inject.Guice.createInjector(Guice.java:62)
» at org.apache.druid.initialization.Initialization.makeInjectorWithModules(Initialization.java:431)
» at org.apache.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:69)
» at org.apache.druid.cli.ServerRunnable.run(ServerRunnable.java:58)
» at org.apache.druid.cli.Main.main(Main.java:113)

which seems did not happen in the Jenkins CI.

@wangyum
Copy link
Member

wangyum commented Dec 24, 2020

The test failure may by related to:

<druid.guice.version>4.1.0</druid.guice.version>

https://github.com/apache/hadoop/blob/release-3.1.0-RC1/hadoop-project/pom.xml#L92

@iemejia
Copy link
Member Author

iemejia commented Dec 24, 2020

I included @wangyum fix into this patch let's see if it works now 🤞

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label Feb 23, 2021
@github-actions github-actions bot closed this Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants