[SPARK-45475][SQL] Uses DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils #43304

HyukjinKwon · 2023-10-10T03:22:49Z

What changes were proposed in this pull request?

This PR is kind of a followup for #39976 that addresses #39976 (comment) comment.

Why are the changes needed?

In order to probably assign the SQL execution ID so df.observe works with this.

Does this PR introduce any user-facing change?

Yes. df.observe will work with JDBC connectors.

How was this patch tested?

Manually tested.

Was this patch authored or co-authored using generative AI tooling?

Unit test was added.

jim0607

QQ: is there an E2E test for this change?

dongjoon-hyun

+1, LGTM.

HyukjinKwon · 2023-10-10T06:07:15Z

QQ: is there an E2E test for this change?

Added a simple test

HyukjinKwon · 2023-10-10T08:16:05Z

Merged to master and branch-3.5.

…eachPartition in JdbcUtils This PR is kind of a followup for #39976 that addresses #39976 (comment) comment. In order to probably assign the SQL execution ID so `df.observe` works with this. Yes. `df.observe` will work with JDBC connectors. Manually tested. Unit test was added. Closes #43304 from HyukjinKwon/foreachbatch. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 39cc4ab) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

beliefer

LGTM later.

tgravescs · 2023-10-10T13:32:18Z

@HyukjinKwon did this break the 3.5 build? I'm seeing an error building:

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala:902: type mismatch;
08:23:21   found   : Object
08:23:21   required: Iterator[org.apache.spark.sql.Row]

HyukjinKwon · 2023-10-10T16:00:21Z

Oops, please feel free to revert directly. I'll make a PR tmr in my time.

HyukjinKwon · 2023-10-10T16:03:51Z

I reverted it. I will make a PR tmr.

…D.foreachPartition in JdbcUtils This PR cherry-picks #43304 to branch-3.5 --- ### What changes were proposed in this pull request? This PR is kind of a followup for #39976 that addresses #39976 (comment) comment. ### Why are the changes needed? In order to probably assign the SQL execution ID so `df.observe` works with this. ### Does this PR introduce _any_ user-facing change? Yes. `df.observe` will work with JDBC connectors. ### How was this patch tested? Manually tested. ### Was this patch authored or co-authored using generative AI tooling? Unit test was added. Closes #43322 from HyukjinKwon/SPARK-45475-3.5. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

Uses DataFrame.foreachBatch instead of RDD.foreachbatch in JdbcUtils

e83be3b

github-actions bot added the SQL label Oct 10, 2023

HyukjinKwon mentioned this pull request Oct 10, 2023

[SPARK-42034] QueryExecutionListener and Observation API do not work with foreach / reduce / foreachPartition action. #39976

Closed

jim0607 reviewed Oct 10, 2023

View reviewed changes

yaooqinn approved these changes Oct 10, 2023

View reviewed changes

dongjoon-hyun approved these changes Oct 10, 2023

View reviewed changes

MaxGekk approved these changes Oct 10, 2023

View reviewed changes

Add a simple test

702194f

import order

abbee3e

HyukjinKwon closed this in 39cc4ab Oct 10, 2023

beliefer reviewed Oct 10, 2023

View reviewed changes

HyukjinKwon mentioned this pull request Oct 11, 2023

[SPARK-45475][SQL][3.5] Uses DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils #43322

Closed

HyukjinKwon deleted the foreachbatch branch January 15, 2024 00:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-45475][SQL] Uses DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils #43304

[SPARK-45475][SQL] Uses DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils #43304

HyukjinKwon commented Oct 10, 2023 •

edited

Loading

jim0607 left a comment

dongjoon-hyun left a comment

HyukjinKwon commented Oct 10, 2023

HyukjinKwon commented Oct 10, 2023

beliefer left a comment

tgravescs commented Oct 10, 2023

HyukjinKwon commented Oct 10, 2023

HyukjinKwon commented Oct 10, 2023

[SPARK-45475][SQL] Uses DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils #43304

[SPARK-45475][SQL] Uses DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils #43304

Conversation

HyukjinKwon commented Oct 10, 2023 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

jim0607 left a comment

Choose a reason for hiding this comment

dongjoon-hyun left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Oct 10, 2023

HyukjinKwon commented Oct 10, 2023

beliefer left a comment

Choose a reason for hiding this comment

tgravescs commented Oct 10, 2023

HyukjinKwon commented Oct 10, 2023

HyukjinKwon commented Oct 10, 2023

HyukjinKwon commented Oct 10, 2023 •

edited

Loading