Skip to content

[SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile #30701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

sunchao
Copy link
Member

@sunchao sunchao commented Dec 10, 2020

What changes were proposed in this pull request?

This:

  1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x.
  2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2

Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client.

In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties:

hadoop-client-api.artifact
hadoop-client-runtime.artifact
hadoop-client-minicluster.artifact

which default to:

hadoop-client-api
hadoop-client-runtime
hadoop-client-minicluster

but all switch to hadoop-client when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer banDuplicatePomDependencyVersions.

Besides above, there are the following changes:

  • explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars.
  • removed the use of ProxyUriUtils.getPath from ApplicationMaster which is a server-side/private API.
  • modified IsolatedClientLoader to exclude hadoop-auth jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is mostly used in tests).

Why are the changes needed?

Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts.

Does this PR introduce any user-facing change?

When people use Spark with hadoop-provided option, they should make sure class path contains hadoop-client-api and hadoop-client-runtime jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts.

How was this patch tested?

Relying on existing tests.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Is hadoop-aws tested manually, @sunchao ?

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Test build #132548 has finished for PR 30701 at commit a78ae6e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37152/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37152/

@sunchao
Copy link
Member Author

sunchao commented Dec 10, 2020

Great! Is hadoop-aws tested manually, @sunchao ?

yes manually tested it and it looks good

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37186/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37186/

@SparkQA
Copy link

SparkQA commented Dec 11, 2020

Test build #132598 has started for PR 30701 at commit ace75c8.

@SparkQA
Copy link

SparkQA commented Dec 11, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37203/

@SparkQA
Copy link

SparkQA commented Dec 11, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37203/

@SparkQA
Copy link

SparkQA commented Dec 11, 2020

Test build #132581 has finished for PR 30701 at commit 66da95e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sunchao sunchao force-pushed the test-hadoop-3.2.2 branch from ace75c8 to 17513d3 Compare January 7, 2021 18:09
@SparkQA
Copy link

SparkQA commented Jan 7, 2021

Test build #133805 has started for PR 30701 at commit 17513d3.

@pan3793
Copy link
Member

pan3793 commented Jan 11, 2021

Hadoop 3.2.2-rc5 became 3.2.2 on Jan 9, does this PR have a chance to merge into 3.1?

@dongjoon-hyun
Copy link
Member

Sorry, but it's too late for Apache Spark 3.1.1, @pan3793 .

Hadoop 3.2.2-rc5 became 3.2.2 on Jan 9, does this PR have a chance to merge into 3.1?

@dongjoon-hyun
Copy link
Member

BTW, @sunchao . Since Apache Hadoop 3.2.2 is out, shall we convert this from Draft to a normal PR?

@sunchao
Copy link
Member Author

sunchao commented Jan 11, 2021

@dongjoon-hyun yes will do that soon.

@sunchao sunchao changed the title [DO-NOT-MERGE][test-maven] Test compatibility against Hadoop 3.2.2 [SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile Jan 15, 2021
@sunchao sunchao marked this pull request as ready for review January 15, 2021 18:20
@sunchao
Copy link
Member Author

sunchao commented Jan 15, 2021

@dongjoon-hyun updated the title and description, updated the PR and let's see how CI goes.

@dongjoon-hyun
Copy link
Member

Thank you for updating, @sunchao !

Copy link
Contributor

@xkrogen xkrogen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to this! Thanks for keeping it going.

Comment on lines -2423 to -2433
<execution>
<id>enforce-no-duplicate-dependencies</id>
<goals>
<goal>enforce</goal>
</goals>
<configuration>
<rules>
<banDuplicatePomDependencyVersions/>
</rules>
</configuration>
</execution>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to provide specific exclusions for the enforcement? It's a shame to have to completely turn off the rule for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether exclusion rules for this. Just tried on my laptop with this and compilation for Hadoop 2.7 works, probably because there are lots of changes since I initially removed this. Let me try it once again with Spark CI.

@@ -112,11 +112,24 @@ private[hive] object IsolatedClientLoader extends Logging {
hadoopVersion: String,
ivyPath: Option[String],
remoteRepos: String): Seq[URL] = {
val hadoopJarNames = if (hadoopVersion.startsWith("3")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this break if hadoopVersion is 3.1.0, 3.2.1, etc. (due to the previous issues with the shaded client JARs)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean HADOOP-16080? yes things could still break in the following cases.

  1. users build Spark without -Phadoop-cloud AND use a version doesn't have the fix in HADOOP-16080, such as:
$ bin/spark-shell --packages org.apache.hadoop:hadoop-aws:3.2.0,org.apache.hadoop:hadoop-common:3.2.0

However I think we should recommend users to stick to the same version used by Spark, i.e., 3.2.2

  1. users build Spark with custom Hadoop version such as 3.1.0/3.2.1 you mentioned via the hadoop.version property, and use this to talk to cloud storage like S3.

To enable these use cases we may have to introduce another Maven property to switch back to non-shaded client, and update here as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah I think (2) is my primary concern. We support building against a custom version using -Dhadoop.version, but right now it will break if you use -Phadoop-3 -Dhadoop.version=3.1.0. This one I believe you can at least work around by changing the -Dhadoop-client-{runtime,api,minicluster}.artifact properties, but here there's no way to work around it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yes we can be more precise here, such as using a version map. I can update this.

Comment on lines +127 to +129
// this introduced from lower version of Hive could conflict with jars in Hadoop 3.2+, so
// exclude here in favor of the ones in Hadoop 3.2+
Seq("org.apache.hadoop:hadoop-auth")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't Hive pull in other non-shaded Hadoop dependencies that could cause issues?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this code is no longer required after #30284. Currently Spark will always load Hadoop classes from the built-in Hadoop version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot about that one. Thanks for the clarification.

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38708/

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38708/

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Test build #134125 has finished for PR 30701 at commit cccb021.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @sunchao .

@sunchao
Copy link
Member Author

sunchao commented Jan 15, 2021

Thanks @dongjoon-hyun for merging! @xkrogen I'll address your comments in a follow-up PR.

@xkrogen
Copy link
Contributor

xkrogen commented Jan 15, 2021

Thanks @sunchao !

dongjoon-hyun pushed a commit that referenced this pull request Jan 26, 2021
… and add more strict Hadoop version check

### What changes were proposed in this pull request?

1. Add back Maven enforcer for duplicate dependencies check
2. More strict check on Hadoop versions which support shaded client in `IsolatedClientLoader`. To do proper version check, this adds a util function `majorMinorPatchVersion` to extract major/minor/patch version from a string.
3. Cleanup unnecessary code

### Why are the changes needed?

The Maven enforcer was removed as part of #30556. This proposes to add it back.

Also, Hadoop shaded client doesn't work in certain cases (see [these comments](#30701 (comment)) for details). This strictly checks that the current Hadoop version (i.e., 3.2.2 at the moment) has good support of shaded client or otherwise fallback to old unshaded ones.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #31203 from sunchao/SPARK-33212-followup.

Lead-authored-by: Chao Sun <sunchao@apple.com>
Co-authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
… and add more strict Hadoop version check

### What changes were proposed in this pull request?

1. Add back Maven enforcer for duplicate dependencies check
2. More strict check on Hadoop versions which support shaded client in `IsolatedClientLoader`. To do proper version check, this adds a util function `majorMinorPatchVersion` to extract major/minor/patch version from a string.
3. Cleanup unnecessary code

### Why are the changes needed?

The Maven enforcer was removed as part of apache#30556. This proposes to add it back.

Also, Hadoop shaded client doesn't work in certain cases (see [these comments](apache#30701 (comment)) for details). This strictly checks that the current Hadoop version (i.e., 3.2.2 at the moment) has good support of shaded client or otherwise fallback to old unshaded ones.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes apache#31203 from sunchao/SPARK-33212-followup.

Lead-authored-by: Chao Sun <sunchao@apple.com>
Co-authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
HyukjinKwon added a commit that referenced this pull request Feb 4, 2021
…lient dependencies in root pom

### What changes were proposed in this pull request?

This PR is a followup of #30701. It uses properties of `hadoop-client-api.artifact`, `hadoop-client-runtime.artifact` and `hadoop-client-minicluster.artifact` explicitly to set the dependencies and versions.

Otherwise, it is logically incorrect. For example, if you build with Hadoop 2, this dependency becomes `hadoop-client-api:2.7.4` internally, which does not exist in Hadoop 2 (https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api).

### Why are the changes needed?

- To fix the logical incorrectness.
- It fixes a potential issue: this actually caused an issue when `generate-sources` plugin is used together with Hadoop 2 by default, which attempts to pull 2.7.4 of `hadoop-client-api`, `hadoop-client-runtime` and `hadoop-client-minicluster` for whatever reason.

### Does this PR introduce _any_ user-facing change?

No for users and dev. It's more a cleanup.

### How was this patch tested?

Manually checked the dependencies are correctly placed.

Closes #31467 from HyukjinKwon/SPARK-33212.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
a0x8o added a commit to a0x8o/spark that referenced this pull request Feb 4, 2021
…lient dependencies in root pom

### What changes were proposed in this pull request?

This PR is a followup of apache/spark#30701. It uses properties of `hadoop-client-api.artifact`, `hadoop-client-runtime.artifact` and `hadoop-client-minicluster.artifact` explicitly to set the dependencies and versions.

Otherwise, it is logically incorrect. For example, if you build with Hadoop 2, this dependency becomes `hadoop-client-api:2.7.4` internally, which does not exist in Hadoop 2 (https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api).

### Why are the changes needed?

- To fix the logical incorrectness.
- It fixes a potential issue: this actually caused an issue when `generate-sources` plugin is used together with Hadoop 2 by default, which attempts to pull 2.7.4 of `hadoop-client-api`, `hadoop-client-runtime` and `hadoop-client-minicluster` for whatever reason.

### Does this PR introduce _any_ user-facing change?

No for users and dev. It's more a cleanup.

### How was this patch tested?

Manually checked the dependencies are correctly placed.

Closes #31467 from HyukjinKwon/SPARK-33212.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
pan3793 pushed a commit to pan3793/spark that referenced this pull request Aug 30, 2021
…ts for Hadoop 3.x profile

This:
1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x.
2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2

Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client.

In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties:

```
hadoop-client-api.artifact
hadoop-client-runtime.artifact
hadoop-client-minicluster.artifact
```

which default to:
```
hadoop-client-api
hadoop-client-runtime
hadoop-client-minicluster
```
but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`.

Besides above, there are the following changes:
- explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars.
- removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API.
- modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests).

Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts.

When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts.

Relying on existing tests.

Closes apache#30701 from sunchao/test-hadoop-3.2.2.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
pan3793 pushed a commit to pan3793/spark that referenced this pull request Aug 30, 2021
… and add more strict Hadoop version check

1. Add back Maven enforcer for duplicate dependencies check
2. More strict check on Hadoop versions which support shaded client in `IsolatedClientLoader`. To do proper version check, this adds a util function `majorMinorPatchVersion` to extract major/minor/patch version from a string.
3. Cleanup unnecessary code

The Maven enforcer was removed as part of apache#30556. This proposes to add it back.

Also, Hadoop shaded client doesn't work in certain cases (see [these comments](apache#30701 (comment)) for details). This strictly checks that the current Hadoop version (i.e., 3.2.2 at the moment) has good support of shaded client or otherwise fallback to old unshaded ones.

No.

Existing tests.

Closes apache#31203 from sunchao/SPARK-33212-followup.

Lead-authored-by: Chao Sun <sunchao@apple.com>
Co-authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@@ -66,7 +66,13 @@
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<artifactId>${hadoop-client-api.artifact}</artifactId>
Copy link
Contributor

@LuciferYang LuciferYang Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunchao @dongjoon-hyun I find there are some bad case with using variables here:

  • if we run mvn dependency:tree -pl resource-managers/yarn -Phadoop-2.7 -Pyarn -am, the dependency-tree is correct
  • if we run mvn dependency:tree -pl resource-managers/yarn -Phadoop-2.7 -Pyarn, the dependency-tree is wrong, the dependency of hadoop 3.3.1 exists in the dependency tree, even though we have added the profile of - Phadoop2.7

image

So if we execute

mvn clean install -DskipTests -pl resource-managers/yarn -am -Phadoop-2.7 -Pyarn
mvn test -pl resource-managers/yarn -Phadoop-2.7 -Pyarn -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite

The test case YarnClusterSuite will failed as follows:

Discovery starting.
Discovery completed in 277 milliseconds.
Run starting. Expected test count is: 27
YarnClusterSuite:
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/com/google/inject/servlet/ServletModule
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
  at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
  ...
  Cause: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.com.google.inject.servlet.ServletModule
  at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
  at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
  ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to add the -am to test the yarn module separately to ensure that the dependency is correct

mvn test -pl resource-managers/yarn -Phadoop-2.7 -am -Pyarn -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -Dtest=none
Run completed in 5 minutes, 7 seconds.
Total number of tests run: 27
Suites: completed 2, aborted 0
Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
All tests passed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Let me take a look. Thanks for reporting it @LuciferYang !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior looks expected to me... If you specify -pl resource-managers/yarn without the -am flag, then you're explicitly telling Maven not to rebuild the core module. If the core module has previously been built with Hadoop 3.3.1, then you will just use it as-is with the old dependencies. IMO, this is just an example of how -pl without -am needs to be used with care.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @LuciferYang . I agree with @xkrogen .

When you use without --am, the other modules are the published SNAPSHOT poms and jars which are built with the default (Hadoop 3) configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry forgot about this. @LuciferYang just to recap, do you still see a bug here? or the behavior is expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunchao Yes, this problem still exists, only behavior of branch-3.1 is expected at present

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I test these command in 3.2-rc4(3.2-rc5 can't build with hadoop-2.7 now) , the problem still exists

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LuciferYang could you check with the fix in #34100? I just tested it with the command you pasted above:

mvn clean install -DskipTests -pl resource-managers/yarn -am -Phadoop-2.7 -Pyarn
mvn test -pl resource-managers/yarn -Phadoop-2.7 -Pyarn -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite

and the tests all passed for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunchao Yes, the behavior is expected now ~ thx ~

dongjoon-hyun added a commit that referenced this pull request Mar 25, 2024
### What changes were proposed in this pull request?

This PR aims to remove unused `commons-beanutils` dependency from `pom.xml` and `LICENSE-binary`.

### Why are the changes needed?

#30701 removed `commons-beanutils` from `hadoop-3` profile at Apache Spark 3.2.0.
- #30701

#40788 removed `hadoop-2` profile from Apache Spark 3.5.0
- #40788

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #45705 from dongjoon-hyun/SPARK-47548.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
sweisdb pushed a commit to sweisdb/spark that referenced this pull request Apr 1, 2024
### What changes were proposed in this pull request?

This PR aims to remove unused `commons-beanutils` dependency from `pom.xml` and `LICENSE-binary`.

### Why are the changes needed?

apache#30701 removed `commons-beanutils` from `hadoop-3` profile at Apache Spark 3.2.0.
- apache#30701

apache#40788 removed `hadoop-2` profile from Apache Spark 3.5.0
- apache#40788

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#45705 from dongjoon-hyun/SPARK-47548.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants