Skip to content

Conversation

@hasnain-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds a new dependency on this library so that we can use it for SSL functionality.

We also include the network-common test helper library in a few places, which will be needed for reusing some test configuration files in the future.

Why are the changes needed?

These dependency changes are needed so we can use this library to provide SSL functionality, and for test helpers.

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI. This is used in unit tests and functionality as part of #42685, from which this is being split out

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the network-yarn module have a requirement to use these native codes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it needs a direct dependency here if you look at the code changes to resource-managers/yarn in #42685 - we do add tests to the yarn code that do pass.

We do change ShuffleTransportContext which is in network-common, and that's what is used by yarn here. network-yarn depends on network-shuffle which depends on network-common so I think that's how the dependency gets pulled in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be test scope right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen the bouncycastle and network-common-helper dependencies are only needed in the test scope, but the netty-tcnative-boringssl-static dependencies need to be available in the production build. We need this dependency in order to provide SSL support.

pom.xml Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the license of this code? if it's the same as netty, I think we're all good. Otherwise need to check it and possibly update LICENSE and licenses/

Copy link
Contributor Author

@hasnain-db hasnain-db Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen netty-tcnative-boringssl-static seems to be apache2 licensed: https://github.com/netty/netty-tcnative/blob/main/LICENSE.txt

which is the same as netty as far as I can tell: https://github.com/netty/netty/blob/4.1/LICENSE.txt

@hasnain-db hasnain-db changed the title [SPARK-44937][CORE] Add netty-tcnative-boringssl-static dependency [SPARK-45376][CORE] Add netty-tcnative-boringssl-static dependency Sep 28, 2023
@hasnain-db
Copy link
Contributor Author

tests are flaky, attempting a rebase

@hasnain-db hasnain-db force-pushed the spark-tls-deps branch 2 times, most recently from ce77fb7 to cc80198 Compare September 29, 2023 21:13
@hasnain-db
Copy link
Contributor Author

@mridulm @JoshRosen this should be good to review now (had to rerun tests a bunch of times due to flakiness)

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, but I would wait for @srowen to approve before merging.

@mridulm mridulm closed this in 4f09b6c Oct 3, 2023
@mridulm
Copy link
Contributor

mridulm commented Oct 4, 2023

Merged to master.
Thanks for adding this @hasnain-db !
Thanks for the review @srowen, @LuciferYang :-)

@LuciferYang
Copy link
Contributor

LuciferYang commented Oct 8, 2023

@hasnain-db Report a issue.

After this pr, the following error occurs when executing the ./dev/make-distribution.sh --tgz -Phive -Phive-thriftserver -Pyarn command for packaging:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run (unpack) on project spark-network-yarn_2.13: An Ant BuildException has occured: Error while expanding /Users/yangjie01/SourceCode/git/spark-mine-13/common/network-yarn/target/scala-2.13/spark-4.0.0-SNAPSHOT-yarn-shuffle.jar
[ERROR] java.nio.file.FileSystemException: /Users/yangjie01/SourceCode/git/spark-mine-13/common/network-yarn/target/exploded/META-INF/license/LICENSE.aix-netbsd.txt: Not a directory
[ERROR] around Ant part ...<unzip src="/Users/yangjie01/SourceCode/git/spark-mine-13/common/network-yarn/target/scala-2.13/spark-4.0.0-SNAPSHOT-yarn-shuffle.jar" dest="/Users/yangjie01/SourceCode/git/spark-mine-13/common/network-yarn/target/exploded/" />... @ 5:232 in /Users/yangjie01/SourceCode/git/spark-mine-13/common/network-yarn/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :spark-network-yarn_2.13

And it could be built successfully before this pr.

My testing environment is

  • MacOs: 13.5.2
  • Java: openjdk version "17.0.8" 2023-07-18 LTS
  • Maven: Apache Maven 3.9.4 (dfbb324ad4a7c8fb0bf182e6d91b0ae20e3d2dd9)

Could you please look into this issue? Thanks ~ @hasnain-db

also cc @wangyum

@hasnain-db
Copy link
Contributor Author

I'm able to reproduce this, will take a look, thanks! might be a few days, I hope that is OK - happy to revert otherwise

@hasnain-db
Copy link
Contributor Author

@LuciferYang I filed https://issues.apache.org/jira/browse/SPARK-45464 to track this. I have a fix locally (just pushed a commit), should submit the PR tonight once CI signal comes back. I'll tag you on the PR.

cc @wangyum

LuciferYang pushed a commit that referenced this pull request Oct 11, 2023
### What changes were proposed in this pull request?

Exclude an apache license in the packaged jar to fix the build for network-yarn. Also shade the native libraries so they can be used.

### Why are the changes needed?

Making a distribution build using the following command currently fails after #43164:

```
./dev/make-distribution.sh --tgz  -Phive -Phive-thriftserver -Pyarn
```

Failures look like

```
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run (unpack) on project spark-network-yarn_2.13: An Ant BuildException has occured: Error while expanding /Users/hasnain.lakhani/spark/common/network-yarn/target/scala-2.13/spark-4.0.0-SNAPSHOT-yarn-shuffle.jar
[ERROR] java.nio.file.FileSystemException: /Users/hasnain.lakhani/spark/common/network-yarn/target/exploded/META-INF/license/LICENSE.aix-netbsd.txt: Not a directory
[ERROR] around Ant part ...<unzip src="/Users/hasnain.lakhani/spark/common/network-yarn/target/scala-2.13/spark-4.0.0-SNAPSHOT-yarn-shuffle.jar" dest="/Users/hasnain.lakhani/spark/common/network-yarn/target/exploded/" />...  5:198 in /Users/hasnain.lakhani/spark/common/network-yarn/target/antrun/build-main.xml
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:3.1.0:run (unpack) on project spark-network-yarn_2.13: An Ant BuildException has occured: Error while expanding /Users/hasnain.lakhani/spark/common/network-yarn/target/scala-2.13/spark-4.0.0-SNAPSHOT-yarn-shuffle.jar
java.nio.file.FileSystemException: /Users/hasnain.lakhani/spark/common/network-yarn/target/exploded/META-INF/license/LICENSE.aix-netbsd.txt: Not a directory
around Ant part ...<unzip src="/Users/hasnain.lakhani/spark/common/network-yarn/target/scala-2.13/spark-4.0.0-SNAPSHOT-yarn-shuffle.jar" dest="/Users/hasnain.lakhani/spark/common/network-yarn/target/exploded/" />...  5:198 in /Users/hasnain.lakhani/spark/common/network-yarn/target/antrun/build-main.xml
```

After much trial and error (I added a debugging section for the next person who runs into this) I realized the problem is that inside the jar, the META-inf folder looks like this:

```
META-INF/LICENSE
META-INF/LICENSE.txt
META-INF/license/
META-INF/license/LICENSE.aix-netbsd.txt
META-INF/license/LICENSE.boringssl.txt
META-INF/license/LICENSE.mvn-wrapper.txt
META-INF/license/LICENSE.tomcat-native.txt
```

Since the filesystem on macosx is case insensitive, this causes issues. Unzipping this fails, as we unzip the file first, then do not create the directory, and fail to unzip the files after that.

This PR proposes to fix it by removing the `LICENSE` file, as suggested, by adding it to the exclusion list, so the folder gets copied. The contents are the same as `LICENSE.txt` (which is also just the apache2 license) so this should be fine. We also shade and copy over the new native libraries similar to the existing ones.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI

I reran

```
./dev/make-distribution.sh --tgz  -Phive -Phive-thriftserver -Pyarn
```

and it succeeded

### Was this patch authored or co-authored using generative AI tooling?

No

### Appendix: debugging

I was able to debug the contents of the jar using `jar tf $jar` and print out the files using `unzip -q -c $jar $path`.

To make iterations go faster I went to `/Users/hasnain.lakhani/spark/common/network-yarn/target/antrun` and edited the ant file by hand then ran `ant -f build-main.xml`

Closes #43289 from hasnain-db/tls-fix-build.

Authored-by: Hasnain Lakhani <hasnain.lakhani@databricks.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants