[pull] master from apache:master #1107

pull · 2025-12-03T17:41:08Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…erver output stream to files ### What changes were proposed in this pull request? Currently, the Spark Connect test server's stdout and stderr are discarded when SPARK_DEBUG_SC_JVM_CLIENT=false, making it difficult to debug test failures. This PR enables log4j logging for Test Spark Connect server in all test modes (both debug and non-debug) by always configuring log4j2.properties. ### Why are the changes needed? When `SPARK_DEBUG_SC_JVM_CLIENT=false` SparkConnectJdbcDataTypeSuite randomly hangs because the child server process blocks on write() calls when stdout/stderr pipe buffers fill up. Without consuming the output, the buffers reach capacity and cause the process to block indefinitely. Instead of `Redirect.DISCARD` , redirect the logs into log4j files ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested and confirmed that log files are created when 1) `SPARK_DEBUG_SC_JVM_CLIENT=false build/sbt "connect-client-jdbc/testOnly org.apache.spark.sql.connect.client.jdbc.SparkConnectJdbcDataTypeSuite"` OR 2) `SPARK_DEBUG_SC_JVM_CLIENT=true build/sbt "connect-client-jdbc/testOnly org.apache.spark.sql.connect.client.jdbc.SparkConnectJdbcDataTypeSuite"` ``` In this file ./target/unit-tests.log ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53275 from vinodkc/br_redirect_stdout_stderr_to_file. Authored-by: vinodkc <vinod.kc.in@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…nt/wait cost ### What changes were proposed in this pull request? When ShuffleBlockFetcherIterator fetch data, two shuffle cost not calculated. 1. Network resource congestion and waiting between `fetchUpToMaxBytes` and `fetchAllHostLocalBlocks` ; 2. Connection establishment congestion. When `fetchUpToMaxBytes` and `fetchAllHostLocalBlocks` send request, create client may be congestion ### Why are the changes needed? Make shuffle fetch wait time request time more accurate. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? For open block request add a Thread.sleep(3000) latency, shuffle read metrics like below <img width="1724" height="829" alt="截屏2025-11-27 17 38 26" src="https://github.com/user-attachments/assets/99f3822d-d5a7-4f4a-abfc-cc272e61667c" /> ### Was this patch authored or co-authored using generative AI tooling? No Closes #53245 from AngersZhuuuu/SPARK-54536. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? In this PR I propose to make `QueryPlanningTracker` as `HybridAnalyzer` field. ### Why are the changes needed? In order to simplify the code and further single-pass analyzer development. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53277 from mihailoale-db/analyzertracker. Authored-by: mihailoale-db <mihailo.aleksic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…for Single ColFamily ### What changes were proposed in this pull request? Introducing a new StatePartitionReader - StatePartitionReaderAllColumnFamilies to support offline repartition. StatePartitionReaderAllColumnFamilies is invoked when user specify option `readAllColumnFamilies` to true. We have the StateDataSource Reader, which allows customers to read the rows in an operator state store using the DataFrame API, just like they read a normal table. But it currently only supports reading one column family in the state store at a time. We would introduce a change to allow reading all the state rows in all the column families, so that we can repartition them at once. This would allow us to read the entire state store, repartition the rows, and then save the new repartition state rows to the cloud. This also has a perf impact, since we don’t have to read each column family separately. We would read the state based on the last committed batch version. Since each column family can have a different schema, the DataFrame we will return will treat the key and value row as bytes - - partition_key (string) - key_bytes (binary) - value_bytes (binary) - column_family_name (string) ### Why are the changes needed? See above ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? See unit test. It not only verify the schema, but also validate the data are serialized to bytes correctly by comparing them against the normal queried data frame ### Was this patch authored or co-authored using generative AI tooling? Yes. haiku, sonnet. Closes #53104 from zifeif2/repartition-reader-single-cf. Lead-authored-by: zifeif2 <zifeifeng11@gmail.com> Co-authored-by: Ubuntu <zifei.feng@your.hostname.com> Signed-off-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>

vinodkc and others added 4 commits December 3, 2025 07:34

pull bot locked and limited conversation to collaborators Dec 3, 2025

pull bot added the ⤵️ pull label Dec 3, 2025

pull bot merged commit df63cb7 into huangxiaopingRD:master Dec 3, 2025

github-actions bot added CORE SQL CONNECT STRUCTURED STREAMING labels Dec 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] master from apache:master #1107

[pull] master from apache:master #1107

Uh oh!

pull bot commented Dec 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[pull] master from apache:master #1107

[pull] master from apache:master #1107

Uh oh!

Conversation

pull bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pull bot commented Dec 3, 2025 •

edited

Loading