Skip to content

[SPARK-33696][BUILD][SQL] Upgrade built-in Hive to 2.3.8 #30657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
Closed

[SPARK-33696][BUILD][SQL] Upgrade built-in Hive to 2.3.8 #30657

wants to merge 10 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Dec 8, 2020

What changes were proposed in this pull request?

Hive 2.3.8 changes:
HIVE-19662: Upgrade Avro to 1.8.2
HIVE-24324: Remove deprecated API usage from Avro
HIVE-23980: Shade Guava from hive-exec in Hive 2.3
HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue
HIVE-24512: Exclude calcite in packaging.
HIVE-22708: Fix for HttpTransport to replace String.equals
HIVE-24551: Hive should include transitive dependencies from calcite after shading it
HIVE-24553: Exclude calcite from test-jar dependency of hive-exec

Why are the changes needed?

Upgrade Avro and Parquet to latest version.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: #30517

@wangyum wangyum marked this pull request as draft December 8, 2020 00:26
@dongjoon-hyun
Copy link
Member

Yey!

@@ -102,7 +102,7 @@ package object client {

// Since HIVE-14496, Hive materialized view need calcite-core.
// For spark, only VersionsSuite currently creates a hive materialized view for testing.
case object v2_3 extends HiveVersion("2.3.7",
case object v2_3 extends HiveVersion("2.3.8",
Copy link
Member

@viirya viirya Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the test internally, this needs to be changed to pass all tests. But let's see the Jenkins result first.

@SparkQA
Copy link

SparkQA commented Dec 8, 2020

Test build #132389 has finished for PR 30657 at commit fc9f735.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Dec 8, 2020

[info] - 2.3: loadTable *** FAILED *** (180 milliseconds)
[info]   java.lang.reflect.InvocationTargetException:
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[info]   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.lang.reflect.Method.invoke(Method.java:498)
[info]   at org.apache.spark.sql.hive.client.Shim_v2_1.loadTable(HiveShim.scala:1267)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$loadTable$1(HiveClientImpl.scala:880)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:289)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:222)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:221)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:271)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:875)
[info]   at org.apache.spark.sql.hive.client.VersionsSuite.$anonfun$new$22(VersionsSuite.scala:288)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:176)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:61)
[info]   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info]   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:61)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
[info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:392)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
[info]   at org.scalatest.Suite.run(Suite.scala:1112)
[info]   at org.scalatest.Suite.run$(Suite.scala:1094)
[info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:61)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:61)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)
[info]   Cause: java.lang.NoSuchMethodError: com.fasterxml.jackson.annotation.JsonFormat$Value.empty()Lcom/fasterxml/jackson/annotation/JsonFormat$Value;
[info]   at com.fasterxml.jackson.databind.cfg.MapperConfig.<clinit>(MapperConfig.java:50)
[info]   at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:565)
[info]   at com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:480)
[info]   at org.apache.hadoop.hive.common.StatsSetupConst$ColumnStatsAccurate.<clinit>(StatsSetupConst.java:164)
[info]   at org.apache.hadoop.hive.common.StatsSetupConst.parseStatsAcc(StatsSetupConst.java:297)
[info]   at org.apache.hadoop.hive.common.StatsSetupConst.clearColumnStatsState(StatsSetupConst.java:261)
[info]   at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:2032)
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[info]   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.lang.reflect.Method.invoke(Method.java:498)
[info]   at org.apache.spark.sql.hive.client.Shim_v2_1.loadTable(HiveShim.scala:1267)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$loadTable$1(HiveClientImpl.scala:880)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:289)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:222)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:221)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:271)
[info]   at org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:875)
[info]   at org.apache.spark.sql.hive.client.VersionsSuite.$anonfun$new$22(VersionsSuite.scala:288)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:176)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:61)
[info]   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info]   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:61)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
[info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:392)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
[info]   at org.scalatest.Suite.run(Suite.scala:1112)
[info]   at org.scalatest.Suite.run$(Suite.scala:1094)
[info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:61)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:61)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)

@viirya
Copy link
Member

viirya commented Dec 8, 2020

[info] - 2.3: loadTable *** FAILED *** (180 milliseconds)
[info]   java.lang.reflect.InvocationTargetException:

Yea, this was we encountered in internal test. So as #30657 (comment), there must be a separated HiveVersion for 2.3.8 to solve these errors.

@wangyum
Copy link
Member Author

wangyum commented Dec 9, 2020

@viirya @sunchao Can you reproduce this issue? Hadoop 2.7.7 + Hive 2.3.8:

[root@spark-3267648 apache-hive-2.3.8-bin]# bin/hive
which: no hbase in (/usr/lib/jdk1.8.0_221/bin:/usr/lib/maven-3.6.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/apache-hive-2.3.8-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/root/apache-hive-2.3.8-bin/lib/hive-common-2.3.8.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> CREATE TABLE tbl AS SELECT 1 AS a;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20201208181102_11753a84-c915-41ae-8865-1d2240e004b1
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2020-12-08 18:11:08,462 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local904330736_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory file:/user/hive/warehouse/.hive-staging_hive_2020-12-08_18-11-02_638_1954554869364101742-1/-ext-10002
Moving data to directory file:/user/hive/warehouse/tbl
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 6.484 seconds
hive> select * from tbl;
Exception in thread "main" java.lang.AssertionError: Internal error: While invoking method 'public org.apache.calcite.sql2rel.RelFieldTrimmer$TrimResult org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRelFieldTrimmer.trimFields(org.apache.calcite.rel.core.Project,org.apache.calcite.util.ImmutableBitSet,java.util.Set)'
	at org.apache.calcite.util.Util.newInternal(Util.java:792)
	at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:534)
	at org.apache.calcite.sql2rel.RelFieldTrimmer.dispatchTrimFields(RelFieldTrimmer.java:270)
	at org.apache.calcite.sql2rel.RelFieldTrimmer.trim(RelFieldTrimmer.java:160)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1331)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1261)
	at org.apache.calcite.tools.Frameworks$1.apply(Frameworks.java:113)
	at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:997)
	at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:149)
	at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:106)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1069)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1085)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:364)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11138)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.calcite.util.ReflectUtil$2.invoke(ReflectUtil.java:531)
	... 31 more
Caused by: java.lang.NoSuchMethodError: org.apache.calcite.rel.RelCollationImpl.<init>(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.<init>(HiveRelCollation.java:29)
	at org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
	at org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)
	at org.apache.calcite.rel.metadata.RelMdCollation.collations(RelMdCollation.java:122)
	at GeneratedMetadataHandler_Collation.collations_$(Unknown Source)
	at GeneratedMetadataHandler_Collation.collations(Unknown Source)
	at org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:482)
	at org.apache.calcite.sql2rel.RelFieldTrimmer.trimChild(RelFieldTrimmer.java:189)
	at org.apache.calcite.sql2rel.RelFieldTrimmer.trimFields(RelFieldTrimmer.java:374)
	at org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveRelFieldTrimmer.trimFields(HiveRelFieldTrimmer.java:273)
	... 36 more

@sunchao
Copy link
Member

sunchao commented Dec 9, 2020

Hmm yes I used to see this failure in Hive UTs but I think it should have been resolved via apache/hive#1356, since we also shade calcite in hive-exec fat jar.

@sunchao
Copy link
Member

sunchao commented Dec 9, 2020

let me try to reproduce this as well locally.

@sunchao
Copy link
Member

sunchao commented Dec 9, 2020

@wangyum @viirya yes I was able to reproduce the same issue. I think it could be cause we have RelCollationImpl included in both hive-exec jar and calcite-core jar. Perhaps we should exclude the latter from the Hive binary distribution and classpath.

I'll update the release vote email thread and start a new RC once this is fixed.

@wangyum
Copy link
Member Author

wangyum commented Dec 9, 2020

Thank you @sunchao .

@viirya
Copy link
Member

viirya commented Dec 9, 2020

@wangyum @viirya yes I was able to reproduce the same issue. I think it could be cause we have RelCollationImpl included in both hive-exec jar and calcite-core jar. Perhaps we should exclude the latter from the Hive binary distribution and classpath.

I'll update the release vote email thread and start a new RC once this is fixed.

We have included org.apache.calcite:* in Hive shaded jar. The following error looks like there is other calcite jar in the classpath so it conflicts with the shaded calcite.

Caused by: java.lang.NoSuchMethodError: org.apache.calcite.rel.RelCollationImpl.<init>(Lorg/apache/hive/com/google/common/collect/ImmutableList;)V
	at org.apache.hadoop.hive.ql.optimizer.calcite.HiveRelCollation.<init>(HiveRelCollation.java:29)
	at org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable.getCollationList(RelOptHiveTable.java:181)
	at org.apache.calcite.rel.metadata.RelMdCollation.table(RelMdCollation.java:175)

Do we know where it comes from?

@sunchao
Copy link
Member

sunchao commented Dec 9, 2020

@viirya yes exactly - it is from calcite-core, perhaps we need to find out which dependency pulls in calcite-core and exclude it from there.

@sunchao
Copy link
Member

sunchao commented Dec 9, 2020

for comparison, the binary distribution from Hive master branch doesn't include any calcite jars, but I'm seeing:

calcite-core-1.10.0.jar
calcite-druid-1.10.0.jar
calcite-linq4j-1.10.0.jar

in 2.3.8 binary distribution.

@viirya
Copy link
Member

viirya commented Dec 9, 2020

for comparison, the binary distribution from Hive master branch doesn't include any calcite jars, but I'm seeing:

calcite-core-1.10.0.jar
calcite-druid-1.10.0.jar
calcite-linq4j-1.10.0.jar

in 2.3.8 binary distribution.

I only see hive-druid-handler pulls calcite-druid in dependency tree. Other calcite-core, calcite-druid are all from hive-exec.

@sunchao
Copy link
Member

sunchao commented Dec 9, 2020

I think we need something like https://issues.apache.org/jira/browse/HIVE-23593 and exclude calcite from bin.xml

@wangyum
Copy link
Member Author

wangyum commented Dec 10, 2020

retest this please.

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37127/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37127/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Test build #132524 has finished for PR 30657 at commit fc9f735.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37153/

"org.apache.calcite.avatica:avatica",
"com.google.guava:guava",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious. Why do we need to exclude it? Hive 2.3.8 is supposed to shade it, isn't it?

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37153/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Test build #132549 has finished for PR 30657 at commit 34a4da2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Dec 10, 2020

@sunchao @viirya Could we do not include com.fasterxml.jackson?

Hive 2.3.8 Hive 2.3.7
image image

Hive 2.3.8 load class from hive-exec-2.3.8.jar:

hive class: com.fasterxml.jackson.annotation.JsonFormat$Value - jar:file:/root/opensource/spark/sql/hive/target/tmp/org.apache.spark.sql.hive.client.VersionsSuite/hive-v2_3-059f3d91-4a54-43cb-8382-6446765c2446/org.apache.hive_hive-exec-2.3.8.jar!/com/fasterxml/jackson/annotation/JsonFormat$Value.class

Hive 2.3.7 load class from jackson-annotations-2.9.5.jar:

hive class: com.fasterxml.jackson.annotation.JsonFormat$Value - jar:file:/root/opensource/spark/sql/hive/target/tmp/org.apache.spark.sql.hive.client.VersionsSuite/hive-v2_3-768ce706-3000-4b11-a8a3-6dd721494904/com.fasterxml.jackson.core_jackson-annotations-2.9.5.jar!/com/fasterxml/jackson/annotation/JsonFormat$Value.class

Comment on lines +1858 to +1873
<exclusion>
<groupId>net.hydromatic</groupId>
<artifactId>eigenbase-properties</artifactId>
</exclusion>
<exclusion>
<groupId>org.codehaus.janino</groupId>
<artifactId>commons-compiler</artifactId>
</exclusion>
<exclusion>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
</exclusion>
<exclusion>
<groupId>org.pentaho</groupId>
<artifactId>pentaho-aggdesigner-algorithm</artifactId>
</exclusion>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These dependencies added by apache/hive@52a4ab8

@SparkQA
Copy link

SparkQA commented Jan 8, 2021

Test build #133840 has finished for PR 30657 at commit fce63a1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SparkPod(pod: Pod, container: Container)
  • trait KubernetesFeatureConfigStep
  • public class Distributions
  • case class Decode(params: Seq[Expression], child: Expression) extends RuntimeReplaceable
  • case class StringDecode(bin: Expression, charset: Expression)
  • case class AlterTableRecoverPartitions(child: LogicalPlan) extends Command
  • case class AlterViewAs(
  • case class CacheTable(
  • case class CacheTableAsSelect(
  • case class SubqueryExec(name: String, child: SparkPlan, maxNumRows: Option[Int] = None)
  • trait BaseCacheTableExec extends V2CommandExec
  • case class CacheTableExec(
  • case class CacheTableAsSelectExec(

@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38450/

@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38450/

@SparkQA
Copy link

SparkQA commented Jan 9, 2021

Test build #133861 has finished for PR 30657 at commit 2248a9c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 16, 2021

Test build #134133 has finished for PR 30657 at commit bb418bb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Jan 16, 2021

Test build #134151 has finished for PR 30657 at commit bb418bb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sunchao
Copy link
Member

sunchao commented Jan 18, 2021

Artifacts have been published to Maven now - could you try this again?

@wangyum
Copy link
Member Author

wangyum commented Jan 18, 2021

retest this please.

@dongjoon-hyun
Copy link
Member

Great! Could you remove the [WIP] in the title and convert from Draft to Normal PR, @wangyum ?

@wangyum wangyum marked this pull request as ready for review January 18, 2021 03:05
@wangyum wangyum changed the title [WIP][SPARK-33696][SQL] Upgrade built-in Hive to 2.3.8 [SPARK-33696][BUILD][SQL] Upgrade built-in Hive to 2.3.8 Jan 18, 2021
@SparkQA
Copy link

SparkQA commented Jan 18, 2021

Test build #134170 has finished for PR 30657 at commit bb418bb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @wangyum and all!
Merged to master for Apache Spark 3.2.0.

@wangyum wangyum deleted the SPARK-33696 branch January 18, 2021 05:57
skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
### What changes were proposed in this pull request?

Hive 2.3.8 changes:
HIVE-19662: Upgrade Avro to 1.8.2
HIVE-24324: Remove deprecated API usage from Avro
HIVE-23980: Shade Guava from hive-exec in Hive 2.3
HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue
HIVE-24512: Exclude calcite in packaging.
HIVE-22708: Fix for HttpTransport to replace String.equals
HIVE-24551: Hive should include transitive dependencies from calcite after shading it
HIVE-24553: Exclude calcite from test-jar dependency of hive-exec

### Why are the changes needed?

Upgrade Avro and Parquet to latest version.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: apache#30517

Closes apache#30657 from wangyum/SPARK-33696.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
LorenzoMartini pushed a commit to palantir/spark that referenced this pull request Apr 19, 2021
Hive 2.3.8 changes:
HIVE-19662: Upgrade Avro to 1.8.2
HIVE-24324: Remove deprecated API usage from Avro
HIVE-23980: Shade Guava from hive-exec in Hive 2.3
HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue
HIVE-24512: Exclude calcite in packaging.
HIVE-22708: Fix for HttpTransport to replace String.equals
HIVE-24551: Hive should include transitive dependencies from calcite after shading it
HIVE-24553: Exclude calcite from test-jar dependency of hive-exec

Upgrade Avro and Parquet to latest version.

No.

Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: apache#30517

Closes apache#30657 from wangyum/SPARK-33696.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
LorenzoMartini pushed a commit to palantir/spark that referenced this pull request Apr 19, 2021
Hive 2.3.8 changes:
HIVE-19662: Upgrade Avro to 1.8.2
HIVE-24324: Remove deprecated API usage from Avro
HIVE-23980: Shade Guava from hive-exec in Hive 2.3
HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue
HIVE-24512: Exclude calcite in packaging.
HIVE-22708: Fix for HttpTransport to replace String.equals
HIVE-24551: Hive should include transitive dependencies from calcite after shading it
HIVE-24553: Exclude calcite from test-jar dependency of hive-exec

Upgrade Avro and Parquet to latest version.

No.

Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: apache#30517

Closes apache#30657 from wangyum/SPARK-33696.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
16pierre pushed a commit to 16pierre/spark that referenced this pull request May 24, 2021
Hive 2.3.8 changes:
HIVE-19662: Upgrade Avro to 1.8.2
HIVE-24324: Remove deprecated API usage from Avro
HIVE-23980: Shade Guava from hive-exec in Hive 2.3
HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue
HIVE-24512: Exclude calcite in packaging.
HIVE-22708: Fix for HttpTransport to replace String.equals
HIVE-24551: Hive should include transitive dependencies from calcite after shading it
HIVE-24553: Exclude calcite from test-jar dependency of hive-exec

Upgrade Avro and Parquet to latest version.

No.

Existing test add test try to upgrade Parquet to 1.11.1 and Avro to 1.10.1: apache#30517

Closes apache#30657 from wangyum/SPARK-33696.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants