Skip to content

Commit

Permalink
[SPARK-49935][BUILD] Exclude spark-connect-shims from assembly mo…
Browse files Browse the repository at this point in the history
…dule

### What changes were proposed in this pull request?
This pr exclude `spark-connect-shims` from `assembly` module to avoid it from being included in the distribution when executing `dev/make-distribution.sh`.

### Why are the changes needed?
`spark-connect-shims` is only used to resolve compilation issues, and it should not be included in the `jars` directory of the distribution, otherwise, it may disrupt REPL-related functionalities.

For examples:

1. spark-shell will fail to start

```
bin/spark-shell --master local
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
{"ts":"2024-10-11T11:54:03.437Z","level":"WARN","msg":"Your hostname, MacBook-Pro.local, resolves to a loopback address: 127.0.0.1; using 172.22.200.181 instead (on interface en0)","context":{"host":"MacBook-Pro.local","host_port":"127.0.0.1","host_port2":"172.22.200.181","network_if":"en0"},"logger":"Utils"}
{"ts":"2024-10-11T11:54:03.439Z","level":"WARN","msg":"Set SPARK_LOCAL_IP if you need to bind to another address","logger":"Utils"}
Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.13.15 (OpenJDK 64-Bit Server VM, Java 17.0.12)
Type in expressions to have them evaluated.
Type :help for more information.
             if (_sc.getConf.getBoolean("spark.ui.reverseProxy", false)) {
                     ^
On line 9: error: value getConf is not a member of org.apache.spark.SparkContext
               val proxyUrl = _sc.getConf.get("spark.ui.reverseProxyUrl", null)
                                  ^
On line 10: error: value getConf is not a member of org.apache.spark.SparkContext
                   s"Spark Context Web UI is available at ${proxyUrl}/proxy/${_sc.applicationId}")
                                                                                  ^
On line 13: error: value applicationId is not a member of org.apache.spark.SparkContext
               _sc.uiWebUrl.foreach {
                   ^
On line 18: error: value uiWebUrl is not a member of org.apache.spark.SparkContext
               s"(master = ${_sc.master}, app id = ${_sc.applicationId}).")
                                 ^
On line 23: error: value master is not a member of org.apache.spark.SparkContext
               s"(master = ${_sc.master}, app id = ${_sc.applicationId}).")
                                                         ^
On line 23: error: value applicationId is not a member of org.apache.spark.SparkContext
                               ^
       error: object SparkContext is not a member of package org.apache.spark
       note: class SparkContext exists, but it has no companion object.
                    ^
       error: object implicits is not a member of package spark
              ^
       error: object sql is not a member of package spark
```

2. SparkR tests on Windows may also fail due to `spark-connect-shims` being in the classpath.

https://github.com/apache/spark/actions/runs/11259624487/job/31309026637

```
══ Failed tests ════════════════════════════════════════���═══════════════════════
── Error ('test_basic.R:25:3'): create DataFrame from list or data.frame ───────
Error in `handleErrors(returnStatus, conn)`: java.lang.NoSuchMethodError: 'void org.apache.spark.SparkContext.<init>(org.apache.spark.SparkConf)'

  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:3050)

  at org.apache.spark.api.r.RRDD$.createSparkContext(RRDD.scala:141)

  at org.apache.spark.api.r.RRDD.createSparkContext(RRDD.scala)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- All Maven test passed on GA: https://github.com/LuciferYang/spark/runs/31405720205

<img width="1034" alt="image" src="https://github.com/user-attachments/assets/d9d76152-ddd2-4c5e-997e-300c33e5c1e6">

- Sparkr on windws test passed on GA: https://github.com/LuciferYang/spark/actions/runs/11291559675/job/31434704406

<img width="959" alt="image" src="https://github.com/user-attachments/assets/cb86a424-8411-493f-b5c6-458f97b3c8e2">

- Manual check:

```
dev/make-distribution.sh --tgz -Phive
```

`spark-connect-shims` is not in either directory `jars` or directory `jars/connect-repl`, and both spark-shell and connect-shell can be used normally

**Spark shell**

```
bin/spark-shell --master local
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
{"ts":"2024-10-12T02:56:47.637Z","level":"WARN","msg":"Your hostname, MacBook-Pro.local, resolves to a loopback address: 127.0.0.1; using 172.22.200.218 instead (on interface en0)","context":{"host":"MacBook-Pro.local","host_port":"127.0.0.1","host_port2":"172.22.200.218","network_if":"en0"},"logger":"Utils"}
{"ts":"2024-10-12T02:56:47.639Z","level":"WARN","msg":"Set SPARK_LOCAL_IP if you need to bind to another address","logger":"Utils"}
Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Using Scala version 2.13.15 (OpenJDK 64-Bit Server VM, Java 17.0.12)
Type in expressions to have them evaluated.
Type :help for more information.
24/10/12 10:56:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://172.22.200.218:4040
Spark context available as 'sc' (master = local, app id = local-1728701810131).
Spark session available as 'spark'.

scala> spark.range(10).show()
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+
```

**Connect shell**

```
bin/spark-shell --remote local
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
{"ts":"2024-10-12T02:58:17.326Z","level":"WARN","msg":"Your hostname, MacBook-Pro.local, resolves to a loopback address: 127.0.0.1; using 172.22.200.218 instead (on interface en0)","context":{"host":"MacBook-Pro.local","host_port":"127.0.0.1","host_port2":"172.22.200.218","network_if":"en0"},"logger":"Utils"}
{"ts":"2024-10-12T02:58:17.328Z","level":"WARN","msg":"Set SPARK_LOCAL_IP if you need to bind to another address","logger":"Utils"}
24/10/12 10:58:19 INFO BaseAllocator: Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true.
24/10/12 10:58:19 INFO DefaultAllocationManagerOption: allocation manager type not specified, using netty as the default type
24/10/12 10:58:19 INFO CheckAllocator: Using DefaultAllocationManager at memory/netty/DefaultAllocationManagerFactory.class
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-SNAPSHOT
      /_/

Type in expressions to have them evaluated.
Spark connect server version 4.0.0-SNAPSHOT.
Spark session available as 'spark'.

scala> spark.range(10).show
Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#48421 from LuciferYang/fix-distribution.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
  • Loading branch information
LuciferYang authored and HyukjinKwon committed Oct 12, 2024
1 parent cf657e5 commit 1fb3d57
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,12 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-connect-client-jvm_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.spark</groupId>
<artifactId>spark-connect-shims_${scala.binary.version}</artifactId>
</exclusion>
</exclusions>
<scope>provided</scope>
</dependency>

Expand Down

0 comments on commit 1fb3d57

Please sign in to comment.