You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### What changes were proposed in this pull request?
This PR proposes to exclude `org.apache.hadoop:hadoop-yarn-server-resourcemanager:jar:tests` from `hadoop-yarn-server-tests` when we use Hadoop 2 profile.
For some reasons, after SBT 1.3 upgrade at SPARK-21708, SBT starts to pull the dependencies of 'hadoop-yarn-server-tests' with 'tests' classifier:
```
org/apache/hadoop/hadoop-common/2.7.4/hadoop-common-2.7.4-tests.jar
org/apache/hadoop/hadoop-yarn-common/2.7.4/hadoop-yarn-common-2.7.4-tests.jar
org/apache/hadoop/hadoop-yarn-server-resourcemanager/2.7.4/hadoop-yarn-server-resourcemanager-2.7.4-tests.jar
```
these were not pulled before the upgrade.
This specific `hadoop-yarn-server-resourcemanager-2.7.4-tests.jar` causes the problem (SPARK-33104)
1. When the test case creates the Hadoop configuration here,
https://github.com/apache/spark/blob/cc06266ade5a4eb35089501a3b32736624208d4c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L122
2. Such jars above have higher precedence in the class path, instead of the specified custom `core-site.xml` in the test:
https://github.com/apache/spark/blob/e93b8f02cd706bedc47c9b55a73f632fe9e61ec3/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1375
3. Later, `core-site.xml` in the jar is picked instead in Hadoop's `Configuration`:
Before this fix:
```
jar:file:/.../https/maven-central.storage-download.googleapis.com/maven2/org/apache/hadoop/
hadoop-yarn-server-resourcemanager/2.7.4/hadoop-yarn-server-resourcemanager-2.7.4-tests.jar!/core-site.xml
```
After this fix:
```
file:/.../spark/resource-managers/yarn/target/org.apache.spark.deploy.yarn.YarnClusterSuite/
org.apache.spark.deploy.yarn.YarnClusterSuite-localDir-nm-0_0/
usercache/.../filecache/10/__spark_conf__.zip/__hadoop_conf__/core-site.xml
```
4. the `core-site.xml` in the jar of course does not contain:
https://github.com/apache/spark/blob/2cfd215dc4fb1ff6865644fec8284ba93dcddd5c/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala#L133-L141
and the specific test fails.
This PR uses some kind of hacky approach. It was excluded from 'hadoop-yarn-server-tests' with 'tests' classifier, and then added back as a proper dependency (when Hadoop 2 profile is used). In this way, SBT does not pull `hadoop-yarn-server-resourcemanager` with `tests` classifier anymore.
### Why are the changes needed?
To make the build pass. This is a blocker.
### Does this PR introduce _any_ user-facing change?
No, test-only.
### How was this patch tested?
Manually tested and debugged:
```bash
build/sbt clean "yarn/testOnly *.YarnClusterSuite -- -z SparkHadoopUtil" -Pyarn -Phadoop-2.7 -Phive -Phive-2.3
```
Closes#30133 from HyukjinKwon/SPARK-33104.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
0 commit comments