Skip to content

Commit 026c9ba

Browse files
committed
HADOOP-19696. hadoop binary distribution to move cloud connectors to hadoop common/lib (#7980)
This moves all the cloud connector libraries to common/lib There are specific build options to control which libraries to include The hadoop-* JARs of the modules are includes, but dependencies are only included when the build-time options specify it. Available package profiles: hadoop-aliyun-package hadoop-aws-package hadoop-azure-datalake-package hadoop-cos-package hadoop-huaweicloud-package This means that by default AWS bundle.jar is no longer included in the distribution: to add it users must drop their chosen version of the SDK into share/hadoop/common/lib Anyone building their own release now has a choice of which connectors to bundle. The ASF ones will stay fairly lean to reduce the CVE attack surface as well as keep package size under control. Contributed by Steve Loughran
1 parent 7ecf2c4 commit 026c9ba

File tree

14 files changed

+458
-60
lines changed

14 files changed

+458
-60
lines changed

BUILDING.txt

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -385,6 +385,49 @@ Create a local staging version of the website (in /tmp/hadoop-site)
385385

386386
Note that the site needs to be built in a second pass after other artifacts.
387387

388+
----------------------------------------------------------------------------------
389+
Including Cloud Connector Dependencies in Distributions:
390+
391+
Hadoop distributions include the hadoop modules needed to work with data and services
392+
on cloud infrastructure
393+
394+
However, dependencies are omitted for all cloud connectors except hadoop-azure
395+
(abfs:// and wasb://) and possibly hadoop-gcp (gs://) and hadoop-tos (tos://).
396+
For the latter two modules, it depends on shading options.
397+
398+
For hadoop-aws the AWS SDK bundle.jar is omitted, but everything else is included.
399+
400+
Excluding the extra binaries:
401+
* Keeps release artifact size below the limit of the ASF distribution network.
402+
* Reduces download and size overhead in docker usage.
403+
* Reduces the CVE attack surface and audit-related complaints about those same CVEs.
404+
* Reduces the risk of classpath conflict.
405+
406+
To produce a build with the specific desired dependencies, the build must be executed
407+
with the relevant profile of ${module}-package alongside the -Pdist profile.
408+
409+
For example, a build with the hadoop-aws and hadoop-azure-datalake dependencies,
410+
run with
411+
412+
mvn package -Pdist -DskipTests -Dhadoop-aws-package -Dhadoop-azure-datalake-package
413+
414+
Available package profiles:
415+
hadoop-aws-package
416+
hadoop-azure-datalake-package
417+
hadoop-cos-package
418+
hadoop-huaweicloud-package
419+
420+
To build a complete distribution then with all cloud dependencies included:
421+
422+
mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true \
423+
-Dhadoop-aliyun-package \
424+
-Dhadoop-aws-package \
425+
-Dhadoop-azure-datalake-package \
426+
-Dhadoop-cos-package \
427+
-Dhadoop-huaweicloud-package
428+
429+
The resulting tar file will be too large to be distributable through ASF infrastructure.
430+
388431
----------------------------------------------------------------------------------
389432
Installing Hadoop
390433

LICENSE-binary

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -203,18 +203,23 @@
203203

204204
--------------------------------------------------------------------------------
205205
This project bundles some components that are also licensed under the Apache
206-
License Version 2.0:
206+
License Version 2.0.
207+
Note: some of the listed artifacts may not be included in a given build of the binary
208+
distribution; it depends on the build options. This list intends
209+
to be inclusive of all which may be included:
207210

208211

209212
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/nvd3-1.8.5.* (css and js files)
210213
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/checker/AbstractFuture.java
211214
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/checker/TimeoutFuture.java
212215

213216
ch.qos.reload4j:reload4j:1.2.22
217+
com.aliyun:aliyun-java-core:0.2.11-beta
214218
com.aliyun:aliyun-java-sdk-core:4.5.10
215219
com.aliyun:aliyun-java-sdk-kms:2.11.0
216220
com.aliyun:aliyun-java-sdk-ram:3.1.0
217221
com.aliyun:aliyun-java-sdk-sts:3.0.0
222+
com.aliyun:java-trace-api:0.2.11-beta
218223
com.aliyun.oss:aliyun-sdk-oss:3.13.2
219224
com.cedarsoftware:java-util:1.9.0
220225
com.cedarsoftware:json-io:2.5.1
@@ -241,6 +246,8 @@ com.google.guava:guava:33.4.8-jre
241246
com.google.guava:listenablefuture:9999.0-empty-to-avoid-conflict-with-guava
242247
com.microsoft.azure:azure-storage:7.0.0
243248
com.nimbusds:nimbus-jose-jwt:10.4
249+
com.squareup.okhttp3:okhttp:jar:3.14.2
250+
com.squareup.okio:okio:jar:1.17.2
244251
com.zaxxer:HikariCP:4.0.3
245252
commons-beanutils:commons-beanutils:1.9.4
246253
commons-cli:commons-cli:1.9.0
@@ -289,6 +296,9 @@ io.netty:netty-transport-native-kqueue:4.1.127.Final
289296
io.netty:netty-resolver-dns-native-macos:4.1.127.Final
290297
io.opencensus:opencensus-api:0.12.3
291298
io.opencensus:opencensus-contrib-grpc-metrics:0.12.3
299+
io.opentracing:opentracing-api:0.33.0.jar
300+
io.opentracing:opentracing-noop:0.33.0.jar
301+
io.opentracing:opentracing-util:0.33.0.jar
292302
io.reactivex:rxjava:1.3.8
293303
io.reactivex:rxjava-string:1.1.1
294304
io.reactivex:rxnetty:0.4.20
@@ -316,6 +326,8 @@ org.apache.htrace:htrace-core:3.1.0-incubating
316326
org.apache.htrace:htrace-core4:4.1.0-incubating
317327
org.apache.httpcomponents:httpclient:4.5.13
318328
org.apache.httpcomponents:httpcore:4.4.13
329+
org.apache.httpcomponents.client5:httpclient5:5.5
330+
org.apache.httpcomponents.core5:httpcore5:5.5
319331
org.apache.kafka:kafka-clients:3.9.0
320332
org.apache.kerby:kerb-admin:2.0.3
321333
org.apache.kerby:kerb-client:2.0.3
@@ -432,6 +444,7 @@ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanage
432444
bootstrap v3.3.6
433445
broccoli-asset-rev v2.4.2
434446
broccoli-funnel v1.0.1
447+
cos_api-bundle-5.6.19.jar
435448
datatables v1.11.5
436449
em-helpers v0.5.13
437450
em-table v0.1.6
@@ -477,7 +490,7 @@ com.microsoft.azure:azure-cosmosdb:2.4.5
477490
com.microsoft.azure:azure-cosmosdb-commons:2.4.5
478491
com.microsoft.azure:azure-cosmosdb-direct:2.4.5
479492
com.microsoft.azure:azure-cosmosdb-gateway:2.4.5
480-
com.microsoft.azure:azure-data-lake-store-sdk:2.3.3
493+
com.microsoft.azure:azure-data-lake-store-sdk:2.3.9
481494
com.microsoft.azure:azure-keyvault-core:1.0.0
482495
com.microsoft.sqlserver:mssql-jdbc:6.2.1.jre7
483496
org.bouncycastle:bcpkix-jdk18on:1.82
@@ -536,3 +549,8 @@ Public Domain
536549
-------------
537550

538551
aopalliance:aopalliance:1.0
552+
553+
Dom4J license
554+
-------------
555+
556+
org.dom4j:dom4j:2.1.4.jar

dev-support/bin/dist-layout-stitching

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,10 @@ run cp -p "${ROOT}/README.txt" .
130130
run copy "${ROOT}/hadoop-common-project/hadoop-common/target/hadoop-common-${VERSION}" .
131131
run copy "${ROOT}/hadoop-common-project/hadoop-nfs/target/hadoop-nfs-${VERSION}" .
132132
run copy "${ROOT}/hadoop-common-project/hadoop-registry/target/hadoop-registry-${VERSION}" .
133+
134+
# cloud connectors go into common
135+
run copy "${ROOT}/hadoop-cloud-storage-project/hadoop-cloud-storage-dist/target/hadoop-cloud-storage-dist-${VERSION}" .
136+
133137
run copy "${ROOT}/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-${VERSION}" .
134138
run copy "${ROOT}/hadoop-hdfs-project/hadoop-hdfs-nfs/target/hadoop-hdfs-nfs-${VERSION}" .
135139
run copy "${ROOT}/hadoop-hdfs-project/hadoop-hdfs-client/target/hadoop-hdfs-client-${VERSION}" .
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one or more
3+
contributor license agreements. See the NOTICE file distributed with
4+
this work for additional information regarding copyright ownership.
5+
The ASF licenses this file to You under the Apache License, Version 2.0
6+
(the "License"); you may not use this file except in compliance with
7+
the License. You may obtain a copy of the License at
8+
9+
https://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the Li2cense is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.3"
18+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
19+
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.3 https://maven.apache.org/xsd/assembly-1.1.3.xsd">
20+
<id>hadoop-cloud-storage</id>
21+
<formats>
22+
<format>dir</format>
23+
</formats>
24+
<includeBaseDirectory>false</includeBaseDirectory>
25+
26+
<!--
27+
This is executed in directory hadoop-cloud-storage-project/hadoop-cloud-storage-dist
28+
All paths must be relative to that.
29+
-->
30+
<fileSets>
31+
<fileSet>
32+
<directory>../../hadoop-tools/hadoop-aws/src/main/bin</directory>
33+
<outputDirectory>/bin</outputDirectory>
34+
<fileMode>0755</fileMode>
35+
</fileSet>
36+
<fileSet>
37+
<directory>./../hadoop-tools/hadoop-aws/src/main/shellprofile.d</directory>
38+
<includes>
39+
<include>*</include>
40+
</includes>
41+
<outputDirectory>/libexec/shellprofile.d</outputDirectory>
42+
<fileMode>0755</fileMode>
43+
</fileSet>
44+
</fileSets>
45+
46+
<dependencySets>
47+
<dependencySet>
48+
<outputDirectory>/share/hadoop/common/lib</outputDirectory>
49+
<unpack>false</unpack>
50+
<scope>runtime</scope>
51+
<useProjectArtifact>false</useProjectArtifact>
52+
<!-- Stop some needless artifact propagation -->
53+
<excludes>
54+
<exclude>org.apache.hadoop:hadoop-annotations</exclude>
55+
<exclude>org.apache.hadoop.thirdparty:hadoop-shaded-guava</exclude>
56+
</excludes>
57+
</dependencySet>
58+
</dependencySets>
59+
</assembly>

hadoop-assemblies/src/main/resources/assemblies/hadoop-src.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@
5757
<exclude>**/file:/**</exclude>
5858
<exclude>**/SecurityAuth.audit*</exclude>
5959
<exclude>patchprocess/**</exclude>
60+
<exclude>**/auth-keys.xml</exclude>
6061
</excludes>
6162
</fileSet>
6263
</fileSets>

0 commit comments

Comments
 (0)