Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Dec 3, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

asl3 and others added 3 commits December 3, 2025 14:26
### What changes were proposed in this pull request?

Remove whitespace to restore docs build

### Why are the changes needed?

Fix docs build in CI

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Locally run `make html` from python/docs directory

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #53298 from asl3/docbuild.

Authored-by: Amanda Liu <amanda.liu@databricks.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
…ataframe from ndarray

### What changes were proposed in this pull request?
Avoid unnecessary pandas conversion in create dataframe from ndarray

### Why are the changes needed?
before:
ndarray -> pandas dataframe -> arrow data

after:
ndarray -> arrow data

and will be consistent with connect mode:
https://github.com/apache/spark/blob/40ba971b7319d74670ba86cc1f280a8a0f7a1dbb/python/pyspark/sql/connect/session.py#L675-L706

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #53280 from zhengruifeng/test_np_arrow.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
…operties` from its own classloader

### What changes were proposed in this pull request?

Change SparkBuildInfo to use its own classloader instead of thread context classloader to load `spark-version-info.properties`.

### Why are the changes needed?

I hit an issue during the Connect JDBC driver & JetBrains DataGrip integration.
```
2025-11-25 18:48:09,475 [  55114]   WARN - #c.i.d.d.BaseDatabaseErrorHandler$MissingDriverClassErrorInfo - Exception org.apache.spark.SparkException: Could not find spark-version-info.properties [in thread "RMI TCP Connection(3)-127.0.0.1"]
java.lang.ExceptionInInitializerError: Exception org.apache.spark.SparkException: Could not find spark-version-info.properties [in thread "RMI TCP Connection(3)-127.0.0.1"]
	at org.apache.spark.SparkBuildInfo$.<clinit>(SparkBuildInfo.scala:35)
	at org.apache.spark.sql.connect.client.SparkConnectClient$.org$apache$spark$sql$connect$client$SparkConnectClient$$genUserAgent(SparkConnectClient.scala:978)
	at org.apache.spark.sql.connect.client.SparkConnectClient$Configuration$.apply$default$8(SparkConnectClient.scala:999)
	at org.apache.spark.sql.connect.client.SparkConnectClient$Builder.<init>(SparkConnectClient.scala:683)
	at org.apache.spark.sql.connect.client.SparkConnectClient$.builder(SparkConnectClient.scala:676)
	at org.apache.spark.sql.connect.client.jdbc.SparkConnectConnection.<init>(SparkConnectConnection.scala:31)
	at org.apache.spark.sql.connect.client.jdbc.NonRegisteringSparkConnectDriver.connect(NonRegisteringSparkConnectDriver.scala:36)
	at com.intellij.database.remote.jdbc.helpers.JdbcHelperImpl.connect(JdbcHelperImpl.java:786)
	at com.intellij.database.remote.jdbc.impl.RemoteDriverImpl.connect(RemoteDriverImpl.java:47)
```

After adding some debug messages, I found it was caused by using wrong classloader.

```
c.i.e.r.RemoteProcessSupport - ContextClassLoader: com.intellij.database.remote.jdbc.impl.JdbcClassLoader$1559cc356
c.i.e.r.RemoteProcessSupport - SparkBuildInfo ClassLoader: com.intellij.database.remote.jdbc.impl.JdbcClassLoader$JdbcClassLoaderImpl62e93ea8
```

Similar issue that affects Hive JDBC driver and Spark's Isolated Classloader (see SPARK-32256) was fixed by [HADOOP-14067](https://issues.apache.org/jira/browse/HADOOP-14067)

### Does this PR introduce _any_ user-facing change?

This fixes corner issues that the application uses multiple classloaders with Spark libs.

### How was this patch tested?

Pass GHA to ensure the change breaks nothing, also manually verified the Connect JDBC driver & JetBrains DataGrip integration.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #53279 from pan3793/SPARK-54565.

Lead-authored-by: Cheng Pan <chengpan@apache.org>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
@pull pull bot locked and limited conversation to collaborators Dec 3, 2025
@pull pull bot added the ⤵️ pull label Dec 3, 2025
@pull pull bot merged commit 095c2c3 into huangxiaopingRD:master Dec 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants