Skip to content

Commit b6f46ca

Browse files
sunchaodongjoon-hyun
authored andcommitted
[SPARK-33212][BUILD] Upgrade to Hadoop 3.2.2 and move to shaded clients for Hadoop 3.x profile
### What changes were proposed in this pull request? This: 1. switches Spark to use shaded Hadoop clients, namely hadoop-client-api and hadoop-client-runtime, for Hadoop 3.x. 2. upgrade built-in version for Hadoop 3.x to Hadoop 3.2.2 Note that for Hadoop 2.7, we'll still use the same modules such as hadoop-client. In order to still keep default Hadoop profile to be hadoop-3.2, this defines the following Maven properties: ``` hadoop-client-api.artifact hadoop-client-runtime.artifact hadoop-client-minicluster.artifact ``` which default to: ``` hadoop-client-api hadoop-client-runtime hadoop-client-minicluster ``` but all switch to `hadoop-client` when the Hadoop profile is hadoop-2.7. A side affect from this is we'll import the same dependency multiple times. For this I have to disable Maven enforcer `banDuplicatePomDependencyVersions`. Besides above, there are the following changes: - explicitly add a few dependencies which are imported via transitive dependencies from Hadoop jars, but are removed from the shaded client jars. - removed the use of `ProxyUriUtils.getPath` from `ApplicationMaster` which is a server-side/private API. - modified `IsolatedClientLoader` to exclude `hadoop-auth` jars when Hadoop version is 3.x. This change should only matter when we're not sharing Hadoop classes with Spark (which is _mostly_ used in tests). ### Why are the changes needed? Hadoop 3.2.2 is released with new features and bug fixes, so it's good for the Spark community to adopt it. However, latest Hadoop versions starting from Hadoop 3.2.1 have upgraded to use Guava 27+. In order to resolve Guava conflicts, this takes the approach by switching to shaded client jars provided by Hadoop. This also has the benefits of avoid pulling other 3rd party dependencies from Hadoop side so as to avoid more potential future conflicts. ### Does this PR introduce _any_ user-facing change? When people use Spark with `hadoop-provided` option, they should make sure class path contains `hadoop-client-api` and `hadoop-client-runtime` jars. In addition, they may need to make sure these jars appear before other Hadoop jars in the order. Otherwise, classes may be loaded from the other non-shaded Hadoop jars and cause potential conflicts. ### How was this patch tested? Relying on existing tests. Closes #30701 from sunchao/test-hadoop-3.2.2. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
1 parent a235c3b commit b6f46ca

File tree

18 files changed

+191
-109
lines changed

18 files changed

+191
-109
lines changed

common/network-yarn/pom.xml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,13 @@
6565
<!-- Provided dependencies -->
6666
<dependency>
6767
<groupId>org.apache.hadoop</groupId>
68-
<artifactId>hadoop-client</artifactId>
68+
<artifactId>${hadoop-client-api.artifact}</artifactId>
69+
<version>${hadoop.version}</version>
70+
</dependency>
71+
<dependency>
72+
<groupId>org.apache.hadoop</groupId>
73+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
74+
<version>${hadoop.version}</version>
6975
</dependency>
7076
<dependency>
7177
<groupId>org.slf4j</groupId>

core/pom.xml

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,13 @@
6666
</dependency>
6767
<dependency>
6868
<groupId>org.apache.hadoop</groupId>
69-
<artifactId>hadoop-client</artifactId>
69+
<artifactId>${hadoop-client-api.artifact}</artifactId>
70+
<version>${hadoop.version}</version>
71+
</dependency>
72+
<dependency>
73+
<groupId>org.apache.hadoop</groupId>
74+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
75+
<version>${hadoop.version}</version>
7076
</dependency>
7177
<dependency>
7278
<groupId>org.apache.spark</groupId>
@@ -177,6 +183,14 @@
177183
<groupId>org.apache.commons</groupId>
178184
<artifactId>commons-text</artifactId>
179185
</dependency>
186+
<dependency>
187+
<groupId>commons-io</groupId>
188+
<artifactId>commons-io</artifactId>
189+
</dependency>
190+
<dependency>
191+
<groupId>commons-collections</groupId>
192+
<artifactId>commons-collections</artifactId>
193+
</dependency>
180194
<dependency>
181195
<groupId>com.google.code.findbugs</groupId>
182196
<artifactId>jsr305</artifactId>

dev/deps/spark-deps-hadoop-2.7-hive-2.3

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ javassist/3.25.0-GA//javassist-3.25.0-GA.jar
128128
javax.inject/1//javax.inject-1.jar
129129
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
130130
javolution/5.5.1//javolution-5.5.1.jar
131-
jaxb-api/2.2.2//jaxb-api-2.2.2.jar
131+
jaxb-api/2.2.11//jaxb-api-2.2.11.jar
132132
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
133133
jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
134134
jdo-api/3.0.1//jdo-api-3.0.1.jar
@@ -227,7 +227,6 @@ spire-macros_2.12/0.17.0-M1//spire-macros_2.12-0.17.0-M1.jar
227227
spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
228228
spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
229229
spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
230-
stax-api/1.0-2//stax-api-1.0-2.jar
231230
stax-api/1.0.1//stax-api-1.0.1.jar
232231
stream/2.9.6//stream-2.9.6.jar
233232
super-csv/2.2.0//super-csv-2.2.0.jar

dev/deps/spark-deps-hadoop-3.2-hive-2.3

Lines changed: 2 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,13 @@ JLargeArrays/1.5//JLargeArrays-1.5.jar
33
JTransforms/3.1//JTransforms-3.1.jar
44
RoaringBitmap/0.9.0//RoaringBitmap-0.9.0.jar
55
ST4/4.0.4//ST4-4.0.4.jar
6-
accessors-smart/1.2//accessors-smart-1.2.jar
76
activation/1.1.1//activation-1.1.1.jar
87
aircompressor/0.16//aircompressor-0.16.jar
98
algebra_2.12/2.0.0-M2//algebra_2.12-2.0.0-M2.jar
109
annotations/17.0.0//annotations-17.0.0.jar
1110
antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
1211
antlr4-runtime/4.8-1//antlr4-runtime-4.8-1.jar
1312
aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
14-
aopalliance/1.0//aopalliance-1.0.jar
1513
arpack_combined_all/0.1//arpack_combined_all-0.1.jar
1614
arrow-format/2.0.0//arrow-format-2.0.0.jar
1715
arrow-memory-core/2.0.0//arrow-memory-core-2.0.0.jar
@@ -28,15 +26,12 @@ breeze_2.12/1.0//breeze_2.12-1.0.jar
2826
cats-kernel_2.12/2.0.0-M4//cats-kernel_2.12-2.0.0-M4.jar
2927
chill-java/0.9.5//chill-java-0.9.5.jar
3028
chill_2.12/0.9.5//chill_2.12-0.9.5.jar
31-
commons-beanutils/1.9.4//commons-beanutils-1.9.4.jar
3229
commons-cli/1.2//commons-cli-1.2.jar
3330
commons-codec/1.15//commons-codec-1.15.jar
3431
commons-collections/3.2.2//commons-collections-3.2.2.jar
3532
commons-compiler/3.0.16//commons-compiler-3.0.16.jar
3633
commons-compress/1.20//commons-compress-1.20.jar
37-
commons-configuration2/2.1.1//commons-configuration2-2.1.1.jar
3834
commons-crypto/1.1.0//commons-crypto-1.1.0.jar
39-
commons-daemon/1.0.13//commons-daemon-1.0.13.jar
4035
commons-dbcp/1.4//commons-dbcp-1.4.jar
4136
commons-httpclient/3.1//commons-httpclient-3.1.jar
4237
commons-io/2.5//commons-io-2.5.jar
@@ -56,30 +51,13 @@ datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
5651
datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
5752
datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
5853
derby/10.14.2.0//derby-10.14.2.0.jar
59-
dnsjava/2.1.7//dnsjava-2.1.7.jar
6054
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
61-
ehcache/3.3.1//ehcache-3.3.1.jar
6255
flatbuffers-java/1.9.0//flatbuffers-java-1.9.0.jar
6356
generex/1.0.2//generex-1.0.2.jar
64-
geronimo-jcache_1.0_spec/1.0-alpha-1//geronimo-jcache_1.0_spec-1.0-alpha-1.jar
6557
gson/2.2.4//gson-2.2.4.jar
6658
guava/14.0.1//guava-14.0.1.jar
67-
guice-servlet/4.0//guice-servlet-4.0.jar
68-
guice/4.0//guice-4.0.jar
69-
hadoop-annotations/3.2.0//hadoop-annotations-3.2.0.jar
70-
hadoop-auth/3.2.0//hadoop-auth-3.2.0.jar
71-
hadoop-client/3.2.0//hadoop-client-3.2.0.jar
72-
hadoop-common/3.2.0//hadoop-common-3.2.0.jar
73-
hadoop-hdfs-client/3.2.0//hadoop-hdfs-client-3.2.0.jar
74-
hadoop-mapreduce-client-common/3.2.0//hadoop-mapreduce-client-common-3.2.0.jar
75-
hadoop-mapreduce-client-core/3.2.0//hadoop-mapreduce-client-core-3.2.0.jar
76-
hadoop-mapreduce-client-jobclient/3.2.0//hadoop-mapreduce-client-jobclient-3.2.0.jar
77-
hadoop-yarn-api/3.2.0//hadoop-yarn-api-3.2.0.jar
78-
hadoop-yarn-client/3.2.0//hadoop-yarn-client-3.2.0.jar
79-
hadoop-yarn-common/3.2.0//hadoop-yarn-common-3.2.0.jar
80-
hadoop-yarn-registry/3.2.0//hadoop-yarn-registry-3.2.0.jar
81-
hadoop-yarn-server-common/3.2.0//hadoop-yarn-server-common-3.2.0.jar
82-
hadoop-yarn-server-web-proxy/3.2.0//hadoop-yarn-server-web-proxy-3.2.0.jar
59+
hadoop-client-api/3.2.2//hadoop-client-api-3.2.2.jar
60+
hadoop-client-runtime/3.2.2//hadoop-client-runtime-3.2.2.jar
8361
hive-beeline/2.3.7//hive-beeline-2.3.7.jar
8462
hive-cli/2.3.7//hive-cli-2.3.7.jar
8563
hive-common/2.3.7//hive-common-2.3.7.jar
@@ -109,8 +87,6 @@ jackson-core/2.11.4//jackson-core-2.11.4.jar
10987
jackson-databind/2.11.4//jackson-databind-2.11.4.jar
11088
jackson-dataformat-yaml/2.11.4//jackson-dataformat-yaml-2.11.4.jar
11189
jackson-datatype-jsr310/2.11.2//jackson-datatype-jsr310-2.11.2.jar
112-
jackson-jaxrs-base/2.9.5//jackson-jaxrs-base-2.9.5.jar
113-
jackson-jaxrs-json-provider/2.9.5//jackson-jaxrs-json-provider-2.9.5.jar
11490
jackson-mapper-asl/1.9.13//jackson-mapper-asl-1.9.13.jar
11591
jackson-module-jaxb-annotations/2.11.4//jackson-module-jaxb-annotations-2.11.4.jar
11692
jackson-module-paranamer/2.11.4//jackson-module-paranamer-2.11.4.jar
@@ -124,13 +100,10 @@ jakarta.ws.rs-api/2.1.6//jakarta.ws.rs-api-2.1.6.jar
124100
jakarta.xml.bind-api/2.3.2//jakarta.xml.bind-api-2.3.2.jar
125101
janino/3.0.16//janino-3.0.16.jar
126102
javassist/3.25.0-GA//javassist-3.25.0-GA.jar
127-
javax.inject/1//javax.inject-1.jar
128103
javax.jdo/3.2.0-m3//javax.jdo-3.2.0-m3.jar
129-
javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar
130104
javolution/5.5.1//javolution-5.5.1.jar
131105
jaxb-api/2.2.11//jaxb-api-2.2.11.jar
132106
jaxb-runtime/2.3.2//jaxb-runtime-2.3.2.jar
133-
jcip-annotations/1.0-1//jcip-annotations-1.0-1.jar
134107
jcl-over-slf4j/1.7.30//jcl-over-slf4j-1.7.30.jar
135108
jdo-api/3.0.1//jdo-api-3.0.1.jar
136109
jersey-client/2.30//jersey-client-2.30.jar
@@ -144,30 +117,14 @@ jline/2.14.6//jline-2.14.6.jar
144117
joda-time/2.10.5//joda-time-2.10.5.jar
145118
jodd-core/3.5.2//jodd-core-3.5.2.jar
146119
jpam/1.1//jpam-1.1.jar
147-
json-smart/2.3//json-smart-2.3.jar
148120
json/1.8//json-1.8.jar
149121
json4s-ast_2.12/3.7.0-M5//json4s-ast_2.12-3.7.0-M5.jar
150122
json4s-core_2.12/3.7.0-M5//json4s-core_2.12-3.7.0-M5.jar
151123
json4s-jackson_2.12/3.7.0-M5//json4s-jackson_2.12-3.7.0-M5.jar
152124
json4s-scalap_2.12/3.7.0-M5//json4s-scalap_2.12-3.7.0-M5.jar
153-
jsp-api/2.1//jsp-api-2.1.jar
154125
jsr305/3.0.0//jsr305-3.0.0.jar
155126
jta/1.1//jta-1.1.jar
156127
jul-to-slf4j/1.7.30//jul-to-slf4j-1.7.30.jar
157-
kerb-admin/1.0.1//kerb-admin-1.0.1.jar
158-
kerb-client/1.0.1//kerb-client-1.0.1.jar
159-
kerb-common/1.0.1//kerb-common-1.0.1.jar
160-
kerb-core/1.0.1//kerb-core-1.0.1.jar
161-
kerb-crypto/1.0.1//kerb-crypto-1.0.1.jar
162-
kerb-identity/1.0.1//kerb-identity-1.0.1.jar
163-
kerb-server/1.0.1//kerb-server-1.0.1.jar
164-
kerb-simplekdc/1.0.1//kerb-simplekdc-1.0.1.jar
165-
kerb-util/1.0.1//kerb-util-1.0.1.jar
166-
kerby-asn1/1.0.1//kerby-asn1-1.0.1.jar
167-
kerby-config/1.0.1//kerby-config-1.0.1.jar
168-
kerby-pkix/1.0.1//kerby-pkix-1.0.1.jar
169-
kerby-util/1.0.1//kerby-util-1.0.1.jar
170-
kerby-xdr/1.0.1//kerby-xdr-1.0.1.jar
171128
kryo-shaded/4.0.2//kryo-shaded-4.0.2.jar
172129
kubernetes-client/4.12.0//kubernetes-client-4.12.0.jar
173130
kubernetes-model-admissionregistration/4.12.0//kubernetes-model-admissionregistration-4.12.0.jar
@@ -205,9 +162,7 @@ metrics-json/4.1.1//metrics-json-4.1.1.jar
205162
metrics-jvm/4.1.1//metrics-jvm-4.1.1.jar
206163
minlog/1.3.0//minlog-1.3.0.jar
207164
netty-all/4.1.51.Final//netty-all-4.1.51.Final.jar
208-
nimbus-jose-jwt/4.41.1//nimbus-jose-jwt-4.41.1.jar
209165
objenesis/2.6//objenesis-2.6.jar
210-
okhttp/2.7.5//okhttp-2.7.5.jar
211166
okhttp/3.12.12//okhttp-3.12.12.jar
212167
okio/1.14.0//okio-1.14.0.jar
213168
opencsv/2.3//opencsv-2.3.jar
@@ -226,7 +181,6 @@ parquet-jackson/1.10.1//parquet-jackson-1.10.1.jar
226181
protobuf-java/2.5.0//protobuf-java-2.5.0.jar
227182
py4j/0.10.9.1//py4j-0.10.9.1.jar
228183
pyrolite/4.30//pyrolite-4.30.jar
229-
re2j/1.1//re2j-1.1.jar
230184
scala-collection-compat_2.12/2.1.1//scala-collection-compat_2.12-2.1.1.jar
231185
scala-compiler/2.12.10//scala-compiler-2.12.10.jar
232186
scala-library/2.12.10//scala-library-2.12.10.jar
@@ -244,15 +198,12 @@ spire-platform_2.12/0.17.0-M1//spire-platform_2.12-0.17.0-M1.jar
244198
spire-util_2.12/0.17.0-M1//spire-util_2.12-0.17.0-M1.jar
245199
spire_2.12/0.17.0-M1//spire_2.12-0.17.0-M1.jar
246200
stax-api/1.0.1//stax-api-1.0.1.jar
247-
stax2-api/3.1.4//stax2-api-3.1.4.jar
248201
stream/2.9.6//stream-2.9.6.jar
249202
super-csv/2.2.0//super-csv-2.2.0.jar
250203
threeten-extra/1.5.0//threeten-extra-1.5.0.jar
251-
token-provider/1.0.1//token-provider-1.0.1.jar
252204
transaction-api/1.1//transaction-api-1.1.jar
253205
univocity-parsers/2.9.0//univocity-parsers-2.9.0.jar
254206
velocity/1.5//velocity-1.5.jar
255-
woodstox-core/5.0.3//woodstox-core-5.0.3.jar
256207
xbean-asm7-shaded/4.15//xbean-asm7-shaded-4.15.jar
257208
xz/1.5//xz-1.5.jar
258209
zjsonpatch/0.3.0//zjsonpatch-0.3.0.jar

external/kafka-0-10-assembly/pom.xml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,9 +71,15 @@
7171
</dependency>
7272
<dependency>
7373
<groupId>org.apache.hadoop</groupId>
74-
<artifactId>hadoop-client</artifactId>
74+
<artifactId>${hadoop-client-api.artifact}</artifactId>
75+
<version>${hadoop.version}</version>
7576
<scope>provided</scope>
7677
</dependency>
78+
<dependency>
79+
<groupId>org.apache.hadoop</groupId>
80+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
81+
<version>${hadoop.version}</version>
82+
</dependency>
7783
<dependency>
7884
<groupId>org.apache.avro</groupId>
7985
<artifactId>avro-mapred</artifactId>

external/kafka-0-10-sql/pom.xml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@
7979
<artifactId>kafka-clients</artifactId>
8080
<version>${kafka.version}</version>
8181
</dependency>
82+
<dependency>
83+
<groupId>com.google.code.findbugs</groupId>
84+
<artifactId>jsr305</artifactId>
85+
</dependency>
8286
<dependency>
8387
<groupId>org.apache.commons</groupId>
8488
<artifactId>commons-pool2</artifactId>

external/kafka-0-10-token-provider/pom.xml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,11 @@
5858
<artifactId>mockito-core</artifactId>
5959
<scope>test</scope>
6060
</dependency>
61+
<dependency>
62+
<groupId>org.apache.hadoop</groupId>
63+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
64+
<scope>${hadoop.deps.scope}</scope>
65+
</dependency>
6166
<dependency>
6267
<groupId>org.apache.spark</groupId>
6368
<artifactId>spark-tags_${scala.binary.version}</artifactId>

external/kinesis-asl-assembly/pom.xml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,9 +91,15 @@
9191
</dependency>
9292
<dependency>
9393
<groupId>org.apache.hadoop</groupId>
94-
<artifactId>hadoop-client</artifactId>
94+
<artifactId>${hadoop-client-api.artifact}</artifactId>
95+
<version>${hadoop.version}</version>
9596
<scope>provided</scope>
9697
</dependency>
98+
<dependency>
99+
<groupId>org.apache.hadoop</groupId>
100+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
101+
<version>${hadoop.version}</version>
102+
</dependency>
97103
<dependency>
98104
<groupId>org.apache.avro</groupId>
99105
<artifactId>avro-ipc</artifactId>

hadoop-cloud/pom.xml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,10 +58,15 @@
5858
</dependency>
5959
<dependency>
6060
<groupId>org.apache.hadoop</groupId>
61-
<artifactId>hadoop-client</artifactId>
61+
<artifactId>${hadoop-client-api.artifact}</artifactId>
6262
<version>${hadoop.version}</version>
6363
<scope>provided</scope>
6464
</dependency>
65+
<dependency>
66+
<groupId>org.apache.hadoop</groupId>
67+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
68+
<version>${hadoop.version}</version>
69+
</dependency>
6570
<!--
6671
the AWS module pulls in jackson; its transitive dependencies can create
6772
intra-jackson-module version problems.

launcher/pom.xml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,14 @@
8181
<!-- Not needed by the test code, but referenced by SparkSubmit which is used by the tests. -->
8282
<dependency>
8383
<groupId>org.apache.hadoop</groupId>
84-
<artifactId>hadoop-client</artifactId>
84+
<artifactId>${hadoop-client-api.artifact}</artifactId>
85+
<version>${hadoop.version}</version>
86+
<scope>test</scope>
87+
</dependency>
88+
<dependency>
89+
<groupId>org.apache.hadoop</groupId>
90+
<artifactId>${hadoop-client-runtime.artifact}</artifactId>
91+
<version>${hadoop.version}</version>
8592
<scope>test</scope>
8693
</dependency>
8794
</dependencies>

0 commit comments

Comments
 (0)