Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-49935][BUILD] Exclude
spark-connect-shims
from assembly
mo…
…dule ### What changes were proposed in this pull request? This pr exclude `spark-connect-shims` from `assembly` module to avoid it from being included in the distribution when executing `dev/make-distribution.sh`. ### Why are the changes needed? `spark-connect-shims` is only used to resolve compilation issues, and it should not be included in the `jars` directory of the distribution, otherwise, it may disrupt REPL-related functionalities. For examples: 1. spark-shell will fail to start ``` bin/spark-shell --master local WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties {"ts":"2024-10-11T11:54:03.437Z","level":"WARN","msg":"Your hostname, MacBook-Pro.local, resolves to a loopback address: 127.0.0.1; using 172.22.200.181 instead (on interface en0)","context":{"host":"MacBook-Pro.local","host_port":"127.0.0.1","host_port2":"172.22.200.181","network_if":"en0"},"logger":"Utils"} {"ts":"2024-10-11T11:54:03.439Z","level":"WARN","msg":"Set SPARK_LOCAL_IP if you need to bind to another address","logger":"Utils"} Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Scala version 2.13.15 (OpenJDK 64-Bit Server VM, Java 17.0.12) Type in expressions to have them evaluated. Type :help for more information. if (_sc.getConf.getBoolean("spark.ui.reverseProxy", false)) { ^ On line 9: error: value getConf is not a member of org.apache.spark.SparkContext val proxyUrl = _sc.getConf.get("spark.ui.reverseProxyUrl", null) ^ On line 10: error: value getConf is not a member of org.apache.spark.SparkContext s"Spark Context Web UI is available at ${proxyUrl}/proxy/${_sc.applicationId}") ^ On line 13: error: value applicationId is not a member of org.apache.spark.SparkContext _sc.uiWebUrl.foreach { ^ On line 18: error: value uiWebUrl is not a member of org.apache.spark.SparkContext s"(master = ${_sc.master}, app id = ${_sc.applicationId}).") ^ On line 23: error: value master is not a member of org.apache.spark.SparkContext s"(master = ${_sc.master}, app id = ${_sc.applicationId}).") ^ On line 23: error: value applicationId is not a member of org.apache.spark.SparkContext ^ error: object SparkContext is not a member of package org.apache.spark note: class SparkContext exists, but it has no companion object. ^ error: object implicits is not a member of package spark ^ error: object sql is not a member of package spark ``` 2. SparkR tests on Windows may also fail due to `spark-connect-shims` being in the classpath. https://github.com/apache/spark/actions/runs/11259624487/job/31309026637 ``` ══ Failed tests ════════════════════════════════════════���═══════════════════════ ── Error ('test_basic.R:25:3'): create DataFrame from list or data.frame ─────── Error in `handleErrors(returnStatus, conn)`: java.lang.NoSuchMethodError: 'void org.apache.spark.SparkContext.<init>(org.apache.spark.SparkConf)' at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:3050) at org.apache.spark.api.r.RRDD$.createSparkContext(RRDD.scala:141) at org.apache.spark.api.r.RRDD.createSparkContext(RRDD.scala) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - All Maven test passed on GA: https://github.com/LuciferYang/spark/runs/31405720205 <img width="1034" alt="image" src="https://github.com/user-attachments/assets/d9d76152-ddd2-4c5e-997e-300c33e5c1e6"> - Sparkr on windws test passed on GA: https://github.com/LuciferYang/spark/actions/runs/11291559675/job/31434704406 <img width="959" alt="image" src="https://github.com/user-attachments/assets/cb86a424-8411-493f-b5c6-458f97b3c8e2"> - Manual check: ``` dev/make-distribution.sh --tgz -Phive ``` `spark-connect-shims` is not in either directory `jars` or directory `jars/connect-repl`, and both spark-shell and connect-shell can be used normally **Spark shell** ``` bin/spark-shell --master local WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties {"ts":"2024-10-12T02:56:47.637Z","level":"WARN","msg":"Your hostname, MacBook-Pro.local, resolves to a loopback address: 127.0.0.1; using 172.22.200.218 instead (on interface en0)","context":{"host":"MacBook-Pro.local","host_port":"127.0.0.1","host_port2":"172.22.200.218","network_if":"en0"},"logger":"Utils"} {"ts":"2024-10-12T02:56:47.639Z","level":"WARN","msg":"Set SPARK_LOCAL_IP if you need to bind to another address","logger":"Utils"} Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Using Scala version 2.13.15 (OpenJDK 64-Bit Server VM, Java 17.0.12) Type in expressions to have them evaluated. Type :help for more information. 24/10/12 10:56:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context Web UI available at http://172.22.200.218:4040 Spark context available as 'sc' (master = local, app id = local-1728701810131). Spark session available as 'spark'. scala> spark.range(10).show() +---+ | id| +---+ | 0| | 1| | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| +---+ ``` **Connect shell** ``` bin/spark-shell --remote local WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties {"ts":"2024-10-12T02:58:17.326Z","level":"WARN","msg":"Your hostname, MacBook-Pro.local, resolves to a loopback address: 127.0.0.1; using 172.22.200.218 instead (on interface en0)","context":{"host":"MacBook-Pro.local","host_port":"127.0.0.1","host_port2":"172.22.200.218","network_if":"en0"},"logger":"Utils"} {"ts":"2024-10-12T02:58:17.328Z","level":"WARN","msg":"Set SPARK_LOCAL_IP if you need to bind to another address","logger":"Utils"} 24/10/12 10:58:19 INFO BaseAllocator: Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true. 24/10/12 10:58:19 INFO DefaultAllocationManagerOption: allocation manager type not specified, using netty as the default type 24/10/12 10:58:19 INFO CheckAllocator: Using DefaultAllocationManager at memory/netty/DefaultAllocationManagerFactory.class Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.0.0-SNAPSHOT /_/ Type in expressions to have them evaluated. Spark connect server version 4.0.0-SNAPSHOT. Spark session available as 'spark'. scala> spark.range(10).show Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties +---+ | id| +---+ | 0| | 1| | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| +---+ ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48421 from LuciferYang/fix-distribution. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
- Loading branch information