-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-45464][CORE] Fix network-yarn distribution build #43289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc: @wangyum @LuciferYang please let me know if this fix works for you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question is, are we sure that network-yarn needs to shade the netty-tcnative-boringssl-static related jars? If indeed necessary, should we rename libnetty_tcnative_* in a similar way to libnetty_transport*?
also cc @pan3793
spark/common/network-yarn/pom.xml
Lines 178 to 185 in ffc6994
| <move file="${project.build.directory}/exploded/META-INF/native/libnetty_transport_native_epoll_x86_64.so" | |
| tofile="${project.build.directory}/exploded/META-INF/native/lib${spark.shade.native.packageName}_netty_transport_native_epoll_x86_64.so" /> | |
| <move file="${project.build.directory}/exploded/META-INF/native/libnetty_transport_native_kqueue_x86_64.jnilib" | |
| tofile="${project.build.directory}/exploded/META-INF/native/lib${spark.shade.native.packageName}_netty_transport_native_kqueue_x86_64.jnilib" /> | |
| <move file="${project.build.directory}/exploded/META-INF/native/libnetty_transport_native_epoll_aarch_64.so" | |
| tofile="${project.build.directory}/exploded/META-INF/native/lib${spark.shade.native.packageName}_netty_transport_native_epoll_aarch_64.so" /> | |
| <move file="${project.build.directory}/exploded/META-INF/native/libnetty_transport_native_kqueue_aarch_64.jnilib" | |
| tofile="${project.build.directory}/exploded/META-INF/native/lib${spark.shade.native.packageName}_netty_transport_native_kqueue_aarch_64.jnilib" /> |
I am not sure if it needs to shade them. I'll admit I'm not the most familiar with this aspect of things, happy to try it without. I will say that even with the shading it does seem to work fine (tests in #42685 pass) so aside from this issue I am not sure about the harm of keeping it shaded for consistency |
|
This depends on whether the |
If it's not provided, the functionality will still work, it'll just be slower (falling back to the JDK implementation). I think for now for simplicity's sake (especially since I don't have a great setup for testing yarn beyond unit tests) I think it makes sense to exclude from the shading, that seems simpler. Will try that out |
|
@hasnain-db I suggest following @LuciferYang's suggestion to follow the existing rule to rename (the prefix must exactly match the relocated package name) the newly introduced native library, it's actually a standard way of creating artifacts including shaded Netty classes and native libraries. For details, you could check netty's code on loading native libs. |
|
I've updated to rename the libraries + use the exclusion, PTAL |
|
we may consider adding these renaming rules to the SBT build. |
Yes, this needs to be fixed. However, inconsistencies between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM if test pass, but I hope to get approve from @srowen before merging.
| <exclude>META-INF/*.SF</exclude> | ||
| <exclude>META-INF/*.DSA</exclude> | ||
| <exclude>META-INF/*.RSA</exclude> | ||
| <exclude>META-INF/LICENSE</exclude> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[WARNING] jackson-annotations-2.15.2.jar, jackson-core-2.15.2.jar, jackson-databind-2.15.2.jar, jackson-module-scala_2.13-2.15.2.jar, log4j-1.2-api-2.20.0.jar, log4j-api-2.20.0.jar, log4j-core-2.20.0.jar, log4j-slf4j2-impl-2.20.0.jar, spark-common-utils_2.13-4.0.0-SNAPSHOT.jar, spark-network-common_2.13-4.0.0-SNAPSHOT.jar, spark-network-shuffle_2.13-4.0.0-SNAPSHOT.jar, spark-network-yarn_2.13-4.0.0-SNAPSHOT.jar define 1 overlapping resource:
[WARNING] - META-INF/LICENSE
From the compile log, META-INF/LICENSE might come from multiple jars.
Since this exclude is indiscriminate, in this case, all META-INF/LICENSE would be excluded. If it includes a non-Apache LICENSE, it would also be excluded. Would this be a problem @srowen ?
Of course, even without the exclude, there would still be issues currently because multiple files with the same name would overwrite each other, and it seems difficult to determine which META-INF/LICENSE would be retained in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we include the license if all of those already in the LICENSE file (or no action is required) so it seems OK here. While it's nice to retain the license buried in the artifact, we should be addressing all licenses in the top level LICENSE and license/ dir, rather than rely on this anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thank you for your explanation @srowen
|
@LuciferYang the docker integration tests are consistently failing with I don't know if they are flaky but it seems to be unrelated to this PR |
friendly ping @dongjoon-hyun, it seems like this is an unstable case? I noticed that you have also encountered similar failures. @hasnain-db Can you manually re-trigger |
|
Yes, |
|
Thank you for your confirmation @dongjoon-hyun ~ |
|
just retriggered. I've done it 3-4 times already though so I'm not sure this will help. will post back when it's done |
|
failed again, retrying once more |
|
@LuciferYang it failed again :/ is it possible to merge this with that test failing? |
LuciferYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
|
Merged into master for Spark 4.0, thanks @hasnain-db @srowen @dongjoon-hyun @wangyum @cfmcgrady |
Sorry for the late reply, I was asleep yesterday. I've checked, the failure should be unrelated to this pr. Thanks ~ |
|
np - thanks for merging! |
What changes were proposed in this pull request?
Exclude an apache license in the packaged jar to fix the build for network-yarn. Also shade the native libraries so they can be used.
Why are the changes needed?
Making a distribution build using the following command currently fails after #43164:
Failures look like
After much trial and error (I added a debugging section for the next person who runs into this) I realized the problem is that inside the jar, the META-inf folder looks like this:
Since the filesystem on macosx is case insensitive, this causes issues. Unzipping this fails, as we unzip the file first, then do not create the directory, and fail to unzip the files after that.
This PR proposes to fix it by removing the
LICENSEfile, as suggested, by adding it to the exclusion list, so the folder gets copied. The contents are the same asLICENSE.txt(which is also just the apache2 license) so this should be fine. We also shade and copy over the new native libraries similar to the existing ones.Does this PR introduce any user-facing change?
No
How was this patch tested?
CI
I reran
and it succeeded
Was this patch authored or co-authored using generative AI tooling?
No
Appendix: debugging
I was able to debug the contents of the jar using
jar tf $jarand print out the files usingunzip -q -c $jar $path.To make iterations go faster I went to
/Users/hasnain.lakhani/spark/common/network-yarn/target/antrunand edited the ant file by hand then ranant -f build-main.xml