Skip to content

Conversation

@hasnain-db
Copy link
Contributor

@hasnain-db hasnain-db commented Oct 9, 2023

What changes were proposed in this pull request?

Exclude an apache license in the packaged jar to fix the build for network-yarn. Also shade the native libraries so they can be used.

Why are the changes needed?

Making a distribution build using the following command currently fails after #43164:

./dev/make-distribution.sh --tgz  -Phive -Phive-thriftserver -Pyarn

Failures look like

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run (unpack) on project spark-network-yarn_2.13: An Ant BuildException has occured: Error while expanding /Users/hasnain.lakhani/spark/common/network-yarn/target/scala-2.13/spark-4.0.0-SNAPSHOT-yarn-shuffle.jar
[ERROR] java.nio.file.FileSystemException: /Users/hasnain.lakhani/spark/common/network-yarn/target/exploded/META-INF/license/LICENSE.aix-netbsd.txt: Not a directory
[ERROR] around Ant part ...<unzip src="/Users/hasnain.lakhani/spark/common/network-yarn/target/scala-2.13/spark-4.0.0-SNAPSHOT-yarn-shuffle.jar" dest="/Users/hasnain.lakhani/spark/common/network-yarn/target/exploded/" />... @ 5:198 in /Users/hasnain.lakhani/spark/common/network-yarn/target/antrun/build-main.xml
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run (unpack) on project spark-network-yarn_2.13: An Ant BuildException has occured: Error while expanding /Users/hasnain.lakhani/spark/common/network-yarn/target/scala-2.13/spark-4.0.0-SNAPSHOT-yarn-shuffle.jar
java.nio.file.FileSystemException: /Users/hasnain.lakhani/spark/common/network-yarn/target/exploded/META-INF/license/LICENSE.aix-netbsd.txt: Not a directory
around Ant part ...<unzip src="/Users/hasnain.lakhani/spark/common/network-yarn/target/scala-2.13/spark-4.0.0-SNAPSHOT-yarn-shuffle.jar" dest="/Users/hasnain.lakhani/spark/common/network-yarn/target/exploded/" />... @ 5:198 in /Users/hasnain.lakhani/spark/common/network-yarn/target/antrun/build-main.xml

After much trial and error (I added a debugging section for the next person who runs into this) I realized the problem is that inside the jar, the META-inf folder looks like this:

META-INF/LICENSE
META-INF/LICENSE.txt
META-INF/license/
META-INF/license/LICENSE.aix-netbsd.txt
META-INF/license/LICENSE.boringssl.txt
META-INF/license/LICENSE.mvn-wrapper.txt
META-INF/license/LICENSE.tomcat-native.txt

Since the filesystem on macosx is case insensitive, this causes issues. Unzipping this fails, as we unzip the file first, then do not create the directory, and fail to unzip the files after that.

This PR proposes to fix it by removing the LICENSE file, as suggested, by adding it to the exclusion list, so the folder gets copied. The contents are the same as LICENSE.txt (which is also just the apache2 license) so this should be fine. We also shade and copy over the new native libraries similar to the existing ones.

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI

I reran

./dev/make-distribution.sh --tgz  -Phive -Phive-thriftserver -Pyarn

and it succeeded

Was this patch authored or co-authored using generative AI tooling?

No

Appendix: debugging

I was able to debug the contents of the jar using jar tf $jar and print out the files using unzip -q -c $jar $path.

To make iterations go faster I went to /Users/hasnain.lakhani/spark/common/network-yarn/target/antrun and edited the ant file by hand then ran ant -f build-main.xml

@github-actions github-actions bot added the BUILD label Oct 9, 2023
@hasnain-db
Copy link
Contributor Author

cc: @wangyum @LuciferYang please let me know if this fix works for you

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question is, are we sure that network-yarn needs to shade the netty-tcnative-boringssl-static related jars? If indeed necessary, should we rename libnetty_tcnative_* in a similar way to libnetty_transport*?

also cc @pan3793

<move file="${project.build.directory}/exploded/META-INF/native/libnetty_transport_native_epoll_x86_64.so"
tofile="${project.build.directory}/exploded/META-INF/native/lib${spark.shade.native.packageName}_netty_transport_native_epoll_x86_64.so" />
<move file="${project.build.directory}/exploded/META-INF/native/libnetty_transport_native_kqueue_x86_64.jnilib"
tofile="${project.build.directory}/exploded/META-INF/native/lib${spark.shade.native.packageName}_netty_transport_native_kqueue_x86_64.jnilib" />
<move file="${project.build.directory}/exploded/META-INF/native/libnetty_transport_native_epoll_aarch_64.so"
tofile="${project.build.directory}/exploded/META-INF/native/lib${spark.shade.native.packageName}_netty_transport_native_epoll_aarch_64.so" />
<move file="${project.build.directory}/exploded/META-INF/native/libnetty_transport_native_kqueue_aarch_64.jnilib"
tofile="${project.build.directory}/exploded/META-INF/native/lib${spark.shade.native.packageName}_netty_transport_native_kqueue_aarch_64.jnilib" />

@hasnain-db
Copy link
Contributor Author

Another question is, are we sure that network-yarn needs to shade the netty-tcnative-boringssl-static related jar packages? If indeed necessary, should we rename libnetty_tcnative_* in a similar way to libnetty_transport*?

I am not sure if it needs to shade them. I'll admit I'm not the most familiar with this aspect of things, happy to try it without. I will say that even with the shading it does seem to work fine (tests in #42685 pass) so aside from this issue I am not sure about the harm of keeping it shaded for consistency

@LuciferYang
Copy link
Contributor

LuciferYang commented Oct 9, 2023

This depends on whether the YarnShuffleService needs the tcnative-boringssl related functions. If so, it indeed needs to be shaded, and libnetty_tcnative_* should likely be renamed similarly; if not, these jars can be directly excluded from the shade plugin.

@hasnain-db
Copy link
Contributor Author

This depends on whether the YarnShuffleService needs the tcnative-boringssl related functions. If so, it indeed needs to be shaded, and libnetty_tcnative_* should likely be renamed similarly; if not, these jars can be directly excluded from the shade.

If it's not provided, the functionality will still work, it'll just be slower (falling back to the JDK implementation). I think for now for simplicity's sake (especially since I don't have a great setup for testing yarn beyond unit tests) I think it makes sense to exclude from the shading, that seems simpler. Will try that out

@pan3793
Copy link
Member

pan3793 commented Oct 9, 2023

@hasnain-db I suggest following @LuciferYang's suggestion to follow the existing rule to rename (the prefix must exactly match the relocated package name) the newly introduced native library, it's actually a standard way of creating artifacts including shaded Netty classes and native libraries.

For details, you could check netty's code on loading native libs.

@hasnain-db
Copy link
Contributor Author

I've updated to rename the libraries + use the exclusion, PTAL

@hasnain-db hasnain-db requested a review from LuciferYang October 9, 2023 06:40
@cfmcgrady
Copy link
Contributor

we may consider adding these renaming rules to the SBT build.

$ ./build/sbt -Pyarn network-yarn/assembly
$ jar tf common/network-yarn/target/scala-2.13/spark-network-yarn-4.0.0-SNAPSHOT-hadoop3.3.6.jar | grep libnetty
META-INF/native/libnetty_transport_native_epoll_aarch_64.so
META-INF/native/libnetty_transport_native_epoll_x86_64.so
META-INF/native/libnetty_transport_native_kqueue_aarch_64.jnilib
META-INF/native/libnetty_transport_native_kqueue_x86_64.jnilib

@LuciferYang
Copy link
Contributor

libnetty_transport_native_kqueue_x86_64

Yes, this needs to be fixed. However, inconsistencies between sbt assembly and maven shaded already existed before adding boringssl , and there are likely more. The main goal of this pr is to fix make-distribution.sh, so I personally prefer to address the differences between sbt assembly and maven shaded in a series of separate PRs.

LuciferYang
LuciferYang previously approved these changes Oct 9, 2023
Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM if test pass, but I hope to get approve from @srowen before merging.

<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>META-INF/LICENSE</exclude>
Copy link
Contributor

@LuciferYang LuciferYang Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[WARNING] jackson-annotations-2.15.2.jar, jackson-core-2.15.2.jar, jackson-databind-2.15.2.jar, jackson-module-scala_2.13-2.15.2.jar, log4j-1.2-api-2.20.0.jar, log4j-api-2.20.0.jar, log4j-core-2.20.0.jar, log4j-slf4j2-impl-2.20.0.jar, spark-common-utils_2.13-4.0.0-SNAPSHOT.jar, spark-network-common_2.13-4.0.0-SNAPSHOT.jar, spark-network-shuffle_2.13-4.0.0-SNAPSHOT.jar, spark-network-yarn_2.13-4.0.0-SNAPSHOT.jar define 1 overlapping resource: 
[WARNING]   - META-INF/LICENSE

From the compile log, META-INF/LICENSE might come from multiple jars.

Since this exclude is indiscriminate, in this case, all META-INF/LICENSE would be excluded. If it includes a non-Apache LICENSE, it would also be excluded. Would this be a problem @srowen ?

Of course, even without the exclude, there would still be issues currently because multiple files with the same name would overwrite each other, and it seems difficult to determine which META-INF/LICENSE would be retained in the end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we include the license if all of those already in the LICENSE file (or no action is required) so it seems OK here. While it's nice to retain the license buried in the artifact, we should be addressing all licenses in the top level LICENSE and license/ dir, rather than rely on this anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thank you for your explanation @srowen

@hasnain-db
Copy link
Contributor Author

@LuciferYang the docker integration tests are consistently failing with

[info] *** 2 SUITES ABORTED ***
[error] Error during tests:
[error] 	org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite
[error] 	org.apache.spark.sql.jdbc.OracleIntegrationSuite

I don't know if they are flaky but it seems to be unrelated to this PR

@LuciferYang
Copy link
Contributor

@LuciferYang the docker integration tests are consistently failing with

[info] *** 2 SUITES ABORTED ***
[error] Error during tests:
[error] 	org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite
[error] 	org.apache.spark.sql.jdbc.OracleIntegrationSuite

I don't know if they are flaky but it seems to be unrelated to this PR

#43123 (comment)

friendly ping @dongjoon-hyun, it seems like this is an unstable case? I noticed that you have also encountered similar failures.

@hasnain-db Can you manually re-trigger Run Docker integration tests?

@dongjoon-hyun
Copy link
Member

Yes, re-trigger will help. I saw the same situation before after upgrading the recent Oracle image, but didn't check the root cause yet, @LuciferYang .

@LuciferYang
Copy link
Contributor

Thank you for your confirmation @dongjoon-hyun ~

@hasnain-db
Copy link
Contributor Author

just retriggered. I've done it 3-4 times already though so I'm not sure this will help. will post back when it's done

@hasnain-db
Copy link
Contributor Author

failed again, retrying once more

@hasnain-db
Copy link
Contributor Author

@LuciferYang it failed again :/ is it possible to merge this with that test failing?

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@LuciferYang
Copy link
Contributor

Merged into master for Spark 4.0, thanks @hasnain-db @srowen @dongjoon-hyun @wangyum @cfmcgrady

@LuciferYang
Copy link
Contributor

LuciferYang commented Oct 11, 2023

@LuciferYang it failed again :/ is it possible to merge this with that test failing?

Sorry for the late reply, I was asleep yesterday. I've checked, the failure should be unrelated to this pr. Thanks ~

@hasnain-db
Copy link
Contributor Author

np - thanks for merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants