[ARROW] Use `KyuubiArrowConveters#toBatchIterator` instead of `ArrowConveters#toBatchIterator` #4754

cfmcgrady · 2023-04-23T06:15:16Z

Why are the changes needed?

to adapt Spark 3.4

the signature of function ArrowConveters#toBatchIterator is changed in apache/spark#38618 (since Spark 3.4)

Before Spark 3.4:

private[sql] def toBatchIterator(
    rowIter: Iterator[InternalRow],
    schema: StructType,
    maxRecordsPerBatch: Int,
    timeZoneId: String,
    context: TaskContext): Iterator[Array[Byte]]

Spark 3.4

private[sql] def toBatchIterator(
    rowIter: Iterator[InternalRow],
    schema: StructType,
    maxRecordsPerBatch: Long,
    timeZoneId: String,
    context: TaskContext): ArrowBatchIterator

the return type is changed from Iterator[Array[Byte]] to ArrowBatchIterator

How was this patch tested?

Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before make a pull request

.../kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala

…ark/sql/kyuubi/SparkDatasetHelper.scala

pan3793

LGTM (pending CI)

.../kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala

…ark/sql/kyuubi/SparkDatasetHelper.scala

codecov-commenter · 2023-04-23T08:10:52Z

Codecov Report

Merging #4754 (a3c58d0) into master (06dd7d3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##             master    #4754   +/-   ##
=========================================
  Coverage     58.06%   58.07%           
  Complexity       13       13           
=========================================
  Files           581      581           
  Lines         32325    32338   +13     
  Branches       4308     4311    +3     
=========================================
+ Hits          18771    18780    +9     
+ Misses        11752    11751    -1     
- Partials       1802     1807    +5

Impacted Files	Coverage Δ
...g/apache/spark/sql/kyuubi/SparkDatasetHelper.scala	`81.35% <100.00%> (-0.55%)`	⬇️

... and 8 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

This reverts commit e32311a.

…tead of `ArrowConveters#toBatchIterator` ### _Why are the changes needed?_ to adapt Spark 3.4 the signature of function `ArrowConveters#toBatchIterator` is changed in apache/spark#38618 (since Spark 3.4) Before Spark 3.4: ``` private[sql] def toBatchIterator( rowIter: Iterator[InternalRow], schema: StructType, maxRecordsPerBatch: Int, timeZoneId: String, context: TaskContext): Iterator[Array[Byte]] ``` Spark 3.4 ``` private[sql] def toBatchIterator( rowIter: Iterator[InternalRow], schema: StructType, maxRecordsPerBatch: Long, timeZoneId: String, context: TaskContext): ArrowBatchIterator ``` the return type is changed from `Iterator[Array[Byte]]` to `ArrowBatchIterator` ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes #4754 from cfmcgrady/arrow-spark34. Closes #4754 a3c58d0 [Fu Chen] fix ci 32704c5 [Fu Chen] Revert "fix ci" e32311a [Fu Chen] fix ci a76af62 [Cheng Pan] Update externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala 453b6a6 [Cheng Pan] Update externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala 74a9f7a [Cheng Pan] Update externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala 4ce5844 [Fu Chen] adapt Spark 3.4 Lead-authored-by: Fu Chen <cfmcgrady@gmail.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org> (cherry picked from commit d0a7ca4) Signed-off-by: Cheng Pan <chengpan@apache.org>

pan3793 · 2023-04-23T09:40:19Z

Thanks, merged to master/1.7

adapt Spark 3.4

4ce5844

github-actions bot added the module:spark label Apr 23, 2023

pan3793 reviewed Apr 23, 2023

View reviewed changes

.../kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala Outdated Show resolved Hide resolved

Update externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/sp…

74a9f7a

…ark/sql/kyuubi/SparkDatasetHelper.scala

pan3793 approved these changes Apr 23, 2023

View reviewed changes

pan3793 assigned cfmcgrady Apr 23, 2023

pan3793 added this to the v1.7.1 milestone Apr 23, 2023

pan3793 reviewed Apr 23, 2023

View reviewed changes

.../kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala Show resolved Hide resolved

.../kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala Show resolved Hide resolved

pan3793 and others added 3 commits April 23, 2023 14:27

Update externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/sp…

453b6a6

…ark/sql/kyuubi/SparkDatasetHelper.scala

Update externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/sp…

a76af62

…ark/sql/kyuubi/SparkDatasetHelper.scala

fix ci

e32311a

cfmcgrady added 2 commits April 23, 2023 16:24

Revert "fix ci"

32704c5

This reverts commit e32311a.

fix ci

a3c58d0

pan3793 closed this in d0a7ca4 Apr 23, 2023

cfmcgrady deleted the arrow-spark34 branch April 23, 2023 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ARROW] Use `KyuubiArrowConveters#toBatchIterator` instead of `ArrowConveters#toBatchIterator` #4754

[ARROW] Use `KyuubiArrowConveters#toBatchIterator` instead of `ArrowConveters#toBatchIterator` #4754

Uh oh!

cfmcgrady commented Apr 23, 2023

Uh oh!

Uh oh!

pan3793 left a comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Apr 23, 2023 •

edited

Loading

Uh oh!

pan3793 commented Apr 23, 2023

Uh oh!

Uh oh!

[ARROW] Use KyuubiArrowConveters#toBatchIterator instead of ArrowConveters#toBatchIterator #4754

[ARROW] Use KyuubiArrowConveters#toBatchIterator instead of ArrowConveters#toBatchIterator #4754

Uh oh!

Conversation

cfmcgrady commented Apr 23, 2023

Why are the changes needed?

How was this patch tested?

Uh oh!

Uh oh!

pan3793 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Apr 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pan3793 commented Apr 23, 2023

Uh oh!

Uh oh!

[ARROW] Use `KyuubiArrowConveters#toBatchIterator` instead of `ArrowConveters#toBatchIterator` #4754

[ARROW] Use `KyuubiArrowConveters#toBatchIterator` instead of `ArrowConveters#toBatchIterator` #4754

codecov-commenter commented Apr 23, 2023 •

edited

Loading