[SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile #30701

sunchao · 2020-12-10T06:55:47Z

What changes were proposed in this pull request?

This:

switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x.
upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2

Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client.

In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties:

hadoop-client-api.artifact
hadoop-client-runtime.artifact
hadoop-client-minicluster.artifact

which default to:

hadoop-client-api
hadoop-client-runtime
hadoop-client-minicluster

but all switch to hadoop-client when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer banDuplicatePomDependencyVersions.

Besides above, there are the following changes:

explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars.
removed the use of ProxyUriUtils.getPath from ApplicationMaster which is a server-side/private API.
modified IsolatedClientLoader to exclude hadoop-auth jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is mostly used in tests).

Why are the changes needed?

Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts.

Does this PR introduce any user-facing change?

When people use Spark with hadoop-provided option, they should make sure class path contains hadoop-client-api and hadoop-client-runtime jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts.

How was this patch tested?

Relying on existing tests.

dev/deps/spark-deps-hadoop-3.2-hive-2.3

dongjoon-hyun

Great! Is hadoop-aws tested manually, @sunchao ?

SparkQA · 2020-12-10T07:47:06Z

Test build #132548 has finished for PR 30701 at commit a78ae6e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-10T08:00:07Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37152/

SparkQA · 2020-12-10T08:34:10Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37152/

sunchao · 2020-12-10T16:57:17Z

Great! Is hadoop-aws tested manually, @sunchao ?

yes manually tested it and it looks good

SparkQA · 2020-12-10T18:27:22Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37186/

SparkQA · 2020-12-10T18:55:41Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37186/

SparkQA · 2020-12-11T00:51:00Z

Test build #132598 has started for PR 30701 at commit ace75c8.

SparkQA · 2020-12-11T01:27:41Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37203/

SparkQA · 2020-12-11T01:55:38Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37203/

SparkQA · 2020-12-11T02:09:43Z

Test build #132581 has finished for PR 30701 at commit 66da95e.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-01-07T18:13:03Z

Test build #133805 has started for PR 30701 at commit 17513d3.

pan3793 · 2021-01-11T06:25:19Z

Hadoop 3.2.2-rc5 became 3.2.2 on Jan 9, does this PR have a chance to merge into 3.1?

dongjoon-hyun · 2021-01-11T16:38:06Z

Sorry, but it's too late for Apache Spark 3.1.1, @pan3793 .

Hadoop 3.2.2-rc5 became 3.2.2 on Jan 9, does this PR have a chance to merge into 3.1?

dongjoon-hyun · 2021-01-11T16:39:02Z

BTW, @sunchao . Since Apache Hadoop 3.2.2 is out, shall we convert this from Draft to a normal PR?

sunchao · 2021-01-11T22:41:08Z

@dongjoon-hyun yes will do that soon.

…t-api to make hadoop-aws work" This reverts commit 290aa02.

sunchao · 2021-01-15T18:21:03Z

@dongjoon-hyun updated the title and description, updated the PR and let's see how CI goes.

dongjoon-hyun · 2021-01-15T18:33:57Z

Thank you for updating, @sunchao !

pom.xml

project/SparkBuild.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

xkrogen

Looking forward to this! Thanks for keeping it going.

xkrogen · 2021-01-15T18:30:00Z

pom.xml

-            <execution>
-              <id>enforce-no-duplicate-dependencies</id>
-              <goals>
-                <goal>enforce</goal>
-              </goals>
-              <configuration>
-                <rules>
-                  <banDuplicatePomDependencyVersions/>
-                </rules>
-              </configuration>
-            </execution>


Is there any way to provide specific exclusions for the enforcement? It's a shame to have to completely turn off the rule for this.

I'm not sure whether exclusion rules for this. Just tried on my laptop with this and compilation for Hadoop 2.7 works, probably because there are lots of changes since I initially removed this. Let me try it once again with Spark CI.

xkrogen · 2021-01-15T18:38:27Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala

@@ -112,11 +112,24 @@ private[hive] object IsolatedClientLoader extends Logging {
      hadoopVersion: String,
      ivyPath: Option[String],
      remoteRepos: String): Seq[URL] = {
+    val hadoopJarNames = if (hadoopVersion.startsWith("3")) {


Will this break if hadoopVersion is 3.1.0, 3.2.1, etc. (due to the previous issues with the shaded client JARs)?

Do you mean HADOOP-16080? yes things could still break in the following cases.

users build Spark without -Phadoop-cloud AND use a version doesn't have the fix in HADOOP-16080, such as:

$ bin/spark-shell --packages org.apache.hadoop:hadoop-aws:3.2.0,org.apache.hadoop:hadoop-common:3.2.0

However I think we should recommend users to stick to the same version used by Spark, i.e., 3.2.2

users build Spark with custom Hadoop version such as 3.1.0/3.2.1 you mentioned via the hadoop.version property, and use this to talk to cloud storage like S3.

To enable these use cases we may have to introduce another Maven property to switch back to non-shaded client, and update here as well.

Yah I think (2) is my primary concern. We support building against a custom version using -Dhadoop.version, but right now it will break if you use -Phadoop-3 -Dhadoop.version=3.1.0. This one I believe you can at least work around by changing the -Dhadoop-client-{runtime,api,minicluster}.artifact properties, but here there's no way to work around it.

Got it. Yes we can be more precise here, such as using a version map. I can update this.

xkrogen · 2021-01-15T18:39:48Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala

+      // this introduced from lower version of Hive could conflict with jars in Hadoop 3.2+, so
+      // exclude here in favor of the ones in Hadoop 3.2+
+      Seq("org.apache.hadoop:hadoop-auth")


Doesn't Hive pull in other non-shaded Hadoop dependencies that could cause issues?

Actually this code is no longer required after #30284. Currently Spark will always load Hadoop classes from the built-in Hadoop version.

Right, I forgot about that one. Thanks for the clarification.

SparkQA · 2021-01-15T19:39:29Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38708/

SparkQA · 2021-01-15T20:17:24Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38708/

SparkQA · 2021-01-15T21:46:23Z

Test build #134125 has finished for PR 30701 at commit cccb021.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM. Thank you, @sunchao .

sunchao · 2021-01-15T22:15:13Z

Thanks @dongjoon-hyun for merging! @xkrogen I'll address your comments in a follow-up PR.

xkrogen · 2021-01-15T22:21:05Z

Thanks @sunchao !

… and add more strict Hadoop version check ### What changes were proposed in this pull request? 1. Add back Maven enforcer for duplicate dependencies check 2. More strict check on Hadoop versions which support shaded client in `IsolatedClientLoader`. To do proper version check, this adds a util function `majorMinorPatchVersion` to extract major/minor/patch version from a string. 3. Cleanup unnecessary code ### Why are the changes needed? The Maven enforcer was removed as part of #30556. This proposes to add it back. Also, Hadoop shaded client doesn't work in certain cases (see [these comments](#30701 (comment)) for details). This strictly checks that the current Hadoop version (i.e., 3.2.2 at the moment) has good support of shaded client or otherwise fallback to old unshaded ones. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes #31203 from sunchao/SPARK-33212-followup. Lead-authored-by: Chao Sun <sunchao@apple.com> Co-authored-by: Chao Sun <sunchao@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

… and add more strict Hadoop version check ### What changes were proposed in this pull request? 1. Add back Maven enforcer for duplicate dependencies check 2. More strict check on Hadoop versions which support shaded client in `IsolatedClientLoader`. To do proper version check, this adds a util function `majorMinorPatchVersion` to extract major/minor/patch version from a string. 3. Cleanup unnecessary code ### Why are the changes needed? The Maven enforcer was removed as part of apache#30556. This proposes to add it back. Also, Hadoop shaded client doesn't work in certain cases (see [these comments](apache#30701 (comment)) for details). This strictly checks that the current Hadoop version (i.e., 3.2.2 at the moment) has good support of shaded client or otherwise fallback to old unshaded ones. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Closes apache#31203 from sunchao/SPARK-33212-followup. Lead-authored-by: Chao Sun <sunchao@apple.com> Co-authored-by: Chao Sun <sunchao@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…lient dependencies in root pom ### What changes were proposed in this pull request? This PR is a followup of #30701. It uses properties of `hadoop-client-api.artifact`, `hadoop-client-runtime.artifact` and `hadoop-client-minicluster.artifact` explicitly to set the dependencies and versions. Otherwise, it is logically incorrect. For example, if you build with Hadoop 2, this dependency becomes `hadoop-client-api:2.7.4` internally, which does not exist in Hadoop 2 (https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api). ### Why are the changes needed? - To fix the logical incorrectness. - It fixes a potential issue: this actually caused an issue when `generate-sources` plugin is used together with Hadoop 2 by default, which attempts to pull 2.7.4 of `hadoop-client-api`, `hadoop-client-runtime` and `hadoop-client-minicluster` for whatever reason. ### Does this PR introduce _any_ user-facing change? No for users and dev. It's more a cleanup. ### How was this patch tested? Manually checked the dependencies are correctly placed. Closes #31467 from HyukjinKwon/SPARK-33212. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

…lient dependencies in root pom ### What changes were proposed in this pull request? This PR is a followup of apache/spark#30701. It uses properties of `hadoop-client-api.artifact`, `hadoop-client-runtime.artifact` and `hadoop-client-minicluster.artifact` explicitly to set the dependencies and versions. Otherwise, it is logically incorrect. For example, if you build with Hadoop 2, this dependency becomes `hadoop-client-api:2.7.4` internally, which does not exist in Hadoop 2 (https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api). ### Why are the changes needed? - To fix the logical incorrectness. - It fixes a potential issue: this actually caused an issue when `generate-sources` plugin is used together with Hadoop 2 by default, which attempts to pull 2.7.4 of `hadoop-client-api`, `hadoop-client-runtime` and `hadoop-client-minicluster` for whatever reason. ### Does this PR introduce _any_ user-facing change? No for users and dev. It's more a cleanup. ### How was this patch tested? Manually checked the dependencies are correctly placed. Closes #31467 from HyukjinKwon/SPARK-33212. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

…ts for Hadoop 3.x profile This: 1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x. 2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2 Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client. In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties: ``` hadoop-client-api.artifact hadoop-client-runtime.artifact hadoop-client-minicluster.artifact ``` which default to: ``` hadoop-client-api hadoop-client-runtime hadoop-client-minicluster ``` but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`. Besides above, there are the following changes: - explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars. - removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API. - modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests). Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts. When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts. Relying on existing tests. Closes apache#30701 from sunchao/test-hadoop-3.2.2. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

… and add more strict Hadoop version check 1. Add back Maven enforcer for duplicate dependencies check 2. More strict check on Hadoop versions which support shaded client in `IsolatedClientLoader`. To do proper version check, this adds a util function `majorMinorPatchVersion` to extract major/minor/patch version from a string. 3. Cleanup unnecessary code The Maven enforcer was removed as part of apache#30556. This proposes to add it back. Also, Hadoop shaded client doesn't work in certain cases (see [these comments](apache#30701 (comment)) for details). This strictly checks that the current Hadoop version (i.e., 3.2.2 at the moment) has good support of shaded client or otherwise fallback to old unshaded ones. No. Existing tests. Closes apache#31203 from sunchao/SPARK-33212-followup. Lead-authored-by: Chao Sun <sunchao@apple.com> Co-authored-by: Chao Sun <sunchao@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

LuciferYang · 2021-09-16T14:25:51Z

core/pom.xml

@@ -66,7 +66,13 @@
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-client</artifactId>
+      <artifactId>${hadoop-client-api.artifact}</artifactId>


@sunchao @dongjoon-hyun I find there are some bad case with using variables here:

if we run mvn dependency:tree -pl resource-managers/yarn -Phadoop-2.7 -Pyarn -am, the dependency-tree is correct

if we run mvn dependency:tree -pl resource-managers/yarn -Phadoop-2.7 -Pyarn, the dependency-tree is wrong, the dependency of hadoop 3.3.1 exists in the dependency tree, even though we have added the profile of - Phadoop2.7

So if we execute

mvn clean install -DskipTests -pl resource-managers/yarn -am -Phadoop-2.7 -Pyarn mvn test -pl resource-managers/yarn -Phadoop-2.7 -Pyarn -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite

The test case YarnClusterSuite will failed as follows:

Discovery starting. Discovery completed in 277 milliseconds. Run starting. Expected test count is: 27 YarnClusterSuite: *** RUN ABORTED *** java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/com/google/inject/servlet/ServletModule at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:757) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) ... Cause: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.com.google.inject.servlet.ServletModule at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) at java.lang.ClassLoader.loadClass(ClassLoader.java:352) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:757) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) ...

We have to add the -am to test the yarn module separately to ensure that the dependency is correct

mvn test -pl resource-managers/yarn -Phadoop-2.7 -am -Pyarn -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -Dtest=none

Run completed in 5 minutes, 7 seconds. Total number of tests run: 27 Suites: completed 2, aborted 0 Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0 All tests passed.

Interesting. Let me take a look. Thanks for reporting it @LuciferYang !

This behavior looks expected to me... If you specify -pl resource-managers/yarn without the -am flag, then you're explicitly telling Maven not to rebuild the core module. If the core module has previously been built with Hadoop 3.3.1, then you will just use it as-is with the old dependencies. IMO, this is just an example of how -pl without -am needs to be used with care.

Hi, @LuciferYang . I agree with @xkrogen .

When you use without --am, the other modules are the published SNAPSHOT poms and jars which are built with the default (Hadoop 3) configuration.

Sorry forgot about this. @LuciferYang just to recap, do you still see a bug here? or the behavior is expected.

@sunchao Yes, this problem still exists, only behavior of branch-3.1 is expected at present

I test these command in 3.2-rc4(3.2-rc5 can't build with hadoop-2.7 now) , the problem still exists

@LuciferYang could you check with the fix in #34100? I just tested it with the command you pasted above:

mvn clean install -DskipTests -pl resource-managers/yarn -am -Phadoop-2.7 -Pyarn mvn test -pl resource-managers/yarn -Phadoop-2.7 -Pyarn -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite

and the tests all passed for me.

@sunchao Yes, the behavior is expected now ~ thx ~

### What changes were proposed in this pull request? This PR aims to remove unused `commons-beanutils` dependency from `pom.xml` and `LICENSE-binary`. ### Why are the changes needed? #30701 removed `commons-beanutils` from `hadoop-3` profile at Apache Spark 3.2.0. - #30701 #40788 removed `hadoop-2` profile from Apache Spark 3.5.0 - #40788 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #45705 from dongjoon-hyun/SPARK-47548. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### What changes were proposed in this pull request? This PR aims to remove unused `commons-beanutils` dependency from `pom.xml` and `LICENSE-binary`. ### Why are the changes needed? apache#30701 removed `commons-beanutils` from `hadoop-3` profile at Apache Spark 3.2.0. - apache#30701 apache#40788 removed `hadoop-2` profile from Apache Spark 3.5.0 - apache#40788 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#45705 from dongjoon-hyun/SPARK-47548. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

sunchao marked this pull request as draft December 10, 2020 06:56

github-actions bot added BUILD CORE KUBERNETES SQL STRUCTURED STREAMING YARN labels Dec 10, 2020

dongjoon-hyun reviewed Dec 10, 2020

View reviewed changes

dev/deps/spark-deps-hadoop-3.2-hive-2.3 Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Dec 10, 2020

View reviewed changes

sunchao force-pushed the test-hadoop-3.2.2 branch from ace75c8 to 17513d3 Compare January 7, 2021 18:09

sunchao added 3 commits January 15, 2021 10:12

Revert "[SPARK-33618][CORE] Use hadoop-client instead of hadoop-clien…

9371b1c

…t-api to make hadoop-aws work" This reverts commit 290aa02.

upgrade to hadoop 3.2.2

d026d4c

fix

bd1a69c

sunchao changed the title ~~[DO-NOT-MERGE][test-maven] Test compatibility against Hadoop 3.2.2~~ [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile Jan 15, 2021

sunchao marked this pull request as ready for review January 15, 2021 18:20

sunchao force-pushed the test-hadoop-3.2.2 branch from 17513d3 to bd1a69c Compare January 15, 2021 18:20

dongjoon-hyun reviewed Jan 15, 2021

View reviewed changes

pom.xml Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Jan 15, 2021

View reviewed changes

project/SparkBuild.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Jan 15, 2021

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated Show resolved Hide resolved

remove staging

cccb021

xkrogen reviewed Jan 15, 2021

View reviewed changes

dongjoon-hyun approved these changes Jan 15, 2021

View reviewed changes

dongjoon-hyun closed this in b6f46ca Jan 15, 2021

sunchao mentioned this pull request Jan 15, 2021

[SPARK-33212][FOLLOW-UP][BUILD] Bring back duplicate dependency check and add more strict Hadoop version check #31203

Closed

HyukjinKwon mentioned this pull request Feb 4, 2021

[SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop client dependencies in root pom #31467

Closed

LuciferYang reviewed Sep 16, 2021

View reviewed changes

dongjoon-hyun mentioned this pull request Mar 25, 2024

[SPARK-47548][BUILD] Remove unused commons-beanutils dependency #45705

Closed

[SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile #30701

[SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile #30701

Uh oh!

Conversation

sunchao commented Dec 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 10, 2020

Uh oh!

SparkQA commented Dec 10, 2020

Uh oh!

SparkQA commented Dec 10, 2020

Uh oh!

sunchao commented Dec 10, 2020

Uh oh!

SparkQA commented Dec 10, 2020

Uh oh!

SparkQA commented Dec 10, 2020

Uh oh!

SparkQA commented Dec 11, 2020

Uh oh!

SparkQA commented Dec 11, 2020

Uh oh!

SparkQA commented Dec 11, 2020

Uh oh!

SparkQA commented Dec 11, 2020

Uh oh!

SparkQA commented Jan 7, 2021

Uh oh!

pan3793 commented Jan 11, 2021

Uh oh!

dongjoon-hyun commented Jan 11, 2021

Uh oh!

dongjoon-hyun commented Jan 11, 2021

Uh oh!

sunchao commented Jan 11, 2021

Uh oh!

sunchao commented Jan 15, 2021

Uh oh!

dongjoon-hyun commented Jan 15, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xkrogen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jan 15, 2021

Uh oh!

SparkQA commented Jan 15, 2021

Uh oh!

SparkQA commented Jan 15, 2021

Uh oh!

dongjoon-hyun left a comment

sunchao commented Dec 10, 2020 •

edited

Loading

LuciferYang Sep 16, 2021 •

edited

Loading

dongjoon-hyun Sep 16, 2021 •

edited

Loading