Skip to content

[WIP][SPARK-29250][BUILD][test-hadoop3.2][test-maven] Upgrade to Hadoop 3.2.1 #25932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 25, 2019

What changes were proposed in this pull request?

This PR aims to upgrade Hadoop version from 3.2.0 to 3.2.1 in hadoop-3.2 profile.

Why are the changes needed?

Hadoop 3.2.1 has 493 patches including client bug fixes and improvements.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the Jenkins with the existing tests.

For the dependency, this PR is tested on both JDK8/JDK11. There is no difference based on JDK versions.

@@ -22,6 +22,8 @@ automaton-1.11-8.jar
avro-1.8.2.jar
avro-ipc-1.8.2.jar
avro-mapred-1.8.2-hadoop2.jar
bcpkix-jdk15on-1.60.jar
bcprov-jdk15on-1.60.jar
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Sep 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, we already have the following in NOTICE-binary. We may need to remove the word optionally, or to exclude these two like before.

This product optionally depends on 'Bouncy Castle Crypto APIs' to generate
a temporary self-signed X.509 certificate when the JVM does not provide the
equivalent functionality.  It can be obtained at:

  * LICENSE:
    * license/LICENSE.bouncycastle.txt (MIT License)
  * HOMEPAGE:
    * http://www.bouncycastle.org/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen . Could you give me some advice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few parts here. First, that NOTICE statement is from Hadoop's NOTICE, so I'd copy whatever it says now, to update.

Second, if it's a first-class dependency now, it needs to have a line in LICENSE-binary and a copy of the license in licenses-binary/. It's MIT-licensed so should be OK.

Finally, BC is a special case because it's subject to crypto export laws. We will have to update http://www.apache.org/licenses/exports/ to say that it's again a dependency in 3.0. I can go figure that out again as and when this is merged.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Sep 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. It sounds too much for me. Please help me after I merge this. 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, the process for the last part is https://www.apache.org/dev/crypto.html#sources
I can do it afterwards, it's not hard.

However, hm, I wonder if Hadoop needs a similar disclosure at http://www.apache.org/licenses/exports/ ? It's possible that somehow it isn't distributed directly by Hadoop, but would be surprised if it's a first-class dep and it makes binary releases.

Maybe, eh, CC @steveloughran in case he knows anything about this angle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, it is in the binary isn't it

hadoop-3.2.1 find . -print | grep bcp
./share/hadoop/yarn/lib/bcprov-jdk15on-1.60.jar
./share/hadoop/yarn/lib/bcpkix-jdk15on-1.60.jar

let me chase this up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @srowen and @steveloughran .
I will use this chance to learn this legal process. :)

@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111375 has finished for PR 25932 at commit a417073.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29250][BUILD] Upgrade to Hadoop 3.2.1 [SPARK-29250][BUILD][test-hadoop3.2][test-java11] Upgrade to Hadoop 3.2.1 Sep 26, 2019
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111390 has finished for PR 25932 at commit a417073.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29250][BUILD][test-hadoop3.2][test-java11] Upgrade to Hadoop 3.2.1 [SPARK-29250][BUILD][test-hadoop3.2][test-java11][test-maven] Upgrade to Hadoop 3.2.1 Sep 26, 2019
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111404 has finished for PR 25932 at commit a417073.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111403 has finished for PR 25932 at commit a417073.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29250][BUILD][test-hadoop3.2][test-java11][test-maven] Upgrade to Hadoop 3.2.1 [SPARK-29250][BUILD][test-hadoop3.2][test-maven] Upgrade to Hadoop 3.2.1 Sep 26, 2019
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29250][BUILD][test-hadoop3.2][test-maven] Upgrade to Hadoop 3.2.1 [SPARK-29250][BUILD][test-hadoop3.2][test-maven][test-java11] Upgrade to Hadoop 3.2.1 Sep 26, 2019
@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111436 has finished for PR 25932 at commit a417073.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29250][BUILD][test-hadoop3.2][test-maven][test-java11] Upgrade to Hadoop 3.2.1 [SPARK-29250][BUILD][test-hadoop3.2][test-maven] Upgrade to Hadoop 3.2.1 Sep 26, 2019
@SparkQA
Copy link

SparkQA commented Sep 26, 2019

Test build #111439 has finished for PR 25932 at commit a417073.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29250][BUILD][test-hadoop3.2][test-maven] Upgrade to Hadoop 3.2.1 [WIP][SPARK-29250][BUILD][test-hadoop3.2][test-maven] Upgrade to Hadoop 3.2.1 Sep 26, 2019
@dongjoon-hyun
Copy link
Member Author

Hi, All. I've been trying to fix the failures due to Hadoop's Guava dependency update, but there was no luck until now. I'll close this one for now.

@ouyangxiaochen
Copy link

ouyangxiaochen commented Nov 1, 2019

@dongjoon-hyun Hi, is there a jira to track this issue about Hadoop's Guava dependency update?

@dongjoon-hyun
Copy link
Member Author

What do you mean? We are tracking here, SPARK-29250 .

@srowen
Copy link
Member

srowen commented Dec 15, 2019

@ouyangxiaochen @dongjoon-hyun I ran into this today and this is what the issue is: Hadoop 3.2.1 updates from Guava 11 to Guava 27:
https://github.com/apache/hadoop/blob/branch-3.2.1/hadoop-project/pom.xml#L95

I think we may need to match this, not least of which because Guava is such a problem dependency that updates have to happen in a major release, probably. (Kind of surprised to see that in a maintenance release of Hadoop). Previously we'd been reluctant to vary from Hadoop, but, for Hadoop 3 profiles, seems like we need to try?

Do you want to try that in this PR as part of moving it along?

CC @steveloughran

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 15, 2019

Yes. I knew, @srowen . It would be great, but I'm not sure we can escape.
The reason why I dropped this is that Apache Hive 2.3.6 also fails at runtime on Hadoop 3.2.1 environment due to this Guava mismatch. AFAIK, there is no plan to upgrade the existing Guava (14.0.1) in Apache Hive branch-2.3.

@dongjoon-hyun
Copy link
Member Author

cc @gatorsmile , @wangyum

@srowen
Copy link
Member

srowen commented Dec 15, 2019

Heh yeah I started working on a "Guava 27" branch and the changes are non-trivial. I think it will require us to simply avoid a lot of Guava usage with various workarounds. Well, I may take this on in the short term as I think we have to wait on further Scala 2.13 updates, and, I think the Scala 2.13 update, via Kafka 2.4, might force this anyway. I'll work on it as, if there is a change here, we'd best do it for Spark 3.0.

@dongjoon-hyun
Copy link
Member Author

Thank you for working on this, @srowen !

@dongjoon-hyun dongjoon-hyun deleted the SPARK-29250 branch December 16, 2019 00:20
@ouyangxiaochen
Copy link

@srowen @wangyum I tried to use hadoop shaded jars to resolve the Guava conflicts, maybe this is not a perfect solution.

@srowen
Copy link
Member

srowen commented Dec 16, 2019

See #26911 for at least directly removing some exposure to Guava

@srowen
Copy link
Member

srowen commented Dec 20, 2019

@dongjoon-hyun #26911 is merged, so you can try making a guava.version property that's set to 27.0-jre for Hadoop 3.2.1.

@HyukjinKwon
Copy link
Member

I have some spare time. Let me try at #27009

@sunchao
Copy link
Member

sunchao commented Sep 22, 2020

I think we may upgrade to Hadoop 3.2.1 via switching to shaded Hadoop client jars. I've created a PR for this: #29843

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants