Skip to content

[SPARK-24360][SQL] Support Hive 3.0 metastore #21404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

[SPARK-24360][SQL] Support Hive 3.0 metastore #21404

wants to merge 2 commits into from

Conversation

dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

Hive 3.0 is released. This PR aims to support Hive 3.0 metastore.

How was this patch tested?

Pass the Jenkins with the updated test cases including 3.0.

val allSupportedHiveVersions = Set(v12, v13, v14, v1_0, v1_1, v1_2, v2_0, v2_1, v2_2, v2_3)
case object v3_0 extends HiveVersion("3.0.0",
exclusions = Seq("org.apache.curator:*",
"org.apache.hadoop:hadoop-aws",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened if we do not have this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll remove this in this PR.

private lazy val clazzLoadFileType = getClass.getClassLoader.loadClass(
"org.apache.hadoop.hive.ql.plan.LoadTableDesc$LoadFileType")

private lazy val loadPartitionMethod =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I tracked and checked all the signature changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @HyukjinKwon .

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91001 has finished for PR 21404 at commit 1523b65.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

The failures some from ClassNotFoundException: org.apache.logging.log4j.util.Strings.

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91003 has finished for PR 21404 at commit 079771d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91008 has finished for PR 21404 at commit 079771d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented May 23, 2018

Can we remove the old hive support? such as 0.12, 0.13 and 0.14.

@HyukjinKwon
Copy link
Member

Probably separate ticket for Spark 3.0.0.

@dongjoon-hyun
Copy link
Member Author

I'm investigating timing issue here. Spark loads Hive Metastore class lazily. Here, Spark is trying to access Hive metastore tables like DBS before it's created.

@gatorsmile
Copy link
Member

@wangyum I do not think we should deprecate the support of the previous versions of Hive metastore. Many Spark users are still using them.

@gatorsmile
Copy link
Member

@dongjoon-hyun Thanks for your investigation!

@tooptoop4
Copy link
Contributor

@dongjoon-hyun @wangyum @gatorsmile @HyukjinKwon Anything left on this? can it be merged to master?

@tooptoop4
Copy link
Contributor

Also, can hive 3.1 be supported easily or are there some breaking changes?

@@ -99,6 +99,7 @@ private[hive] object IsolatedClientLoader extends Logging {
case "2.1" | "2.1.0" | "2.1.1" => hive.v2_1
case "2.2" | "2.2.0" => hive.v2_2
case "2.3" | "2.3.0" | "2.3.1" | "2.3.2" | "2.3.3" => hive.v2_3
case "3.0" | "3.0.0" => hive.v3_0
Copy link
Member

@wangyum wangyum Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Please update sql-programming-guide.md and HiveUtils.scala:

options are <code>0.12.0</code> through <code>2.3.3</code>.

s"<code>0.12.0</code> through <code>2.3.3</code>.")

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @tooptoop4 and @wangyum .
I'm going to update this to the latest Hive 3.1.0.

@tooptoop4
Copy link
Contributor

bump

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Sep 19, 2018

@tooptoop4 . Since this is a new feature, it's now targeting for Apache Spark 2.5 because branch-2.4 is already cut. In addition, this will not be allowed for backporting to branch-2.4. Is it urgent for you, @tooptoop4 ?

@tooptoop4
Copy link
Contributor

@dongjoon-hyun I was planning to do my own custom build by cherrypicking your PR if you had it available

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Sep 20, 2018

That would be helpful for us in the end. But, sorry for now. I'm currently not planning it open soon because the next release (Apache Spark 2.5 or 3.0) will be next year.

@dongjoon-hyun
Copy link
Member Author

Hi, All.
#23694 supercedes this PR.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-24360 branch January 30, 2019 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants