Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARROW] Use KyuubiArrowConveters#toBatchIterator instead of ArrowConveters#toBatchIterator #4754

Closed
wants to merge 7 commits into from

Conversation

cfmcgrady
Copy link
Contributor

Why are the changes needed?

to adapt Spark 3.4

the signature of function ArrowConveters#toBatchIterator is changed in apache/spark#38618 (since Spark 3.4)

Before Spark 3.4:

private[sql] def toBatchIterator(
    rowIter: Iterator[InternalRow],
    schema: StructType,
    maxRecordsPerBatch: Int,
    timeZoneId: String,
    context: TaskContext): Iterator[Array[Byte]]

Spark 3.4

private[sql] def toBatchIterator(
    rowIter: Iterator[InternalRow],
    schema: StructType,
    maxRecordsPerBatch: Long,
    timeZoneId: String,
    context: TaskContext): ArrowBatchIterator

the return type is changed from Iterator[Array[Byte]] to ArrowBatchIterator

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (pending CI)

@pan3793 pan3793 added this to the v1.7.1 milestone Apr 23, 2023
@codecov-commenter
Copy link

codecov-commenter commented Apr 23, 2023

Codecov Report

Merging #4754 (a3c58d0) into master (06dd7d3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##             master    #4754   +/-   ##
=========================================
  Coverage     58.06%   58.07%           
  Complexity       13       13           
=========================================
  Files           581      581           
  Lines         32325    32338   +13     
  Branches       4308     4311    +3     
=========================================
+ Hits          18771    18780    +9     
+ Misses        11752    11751    -1     
- Partials       1802     1807    +5     
Impacted Files Coverage Δ
...g/apache/spark/sql/kyuubi/SparkDatasetHelper.scala 81.35% <100.00%> (-0.55%) ⬇️

... and 8 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pan3793 pan3793 closed this in d0a7ca4 Apr 23, 2023
pan3793 added a commit that referenced this pull request Apr 23, 2023
…tead of `ArrowConveters#toBatchIterator`

### _Why are the changes needed?_

to adapt Spark 3.4

the signature of function `ArrowConveters#toBatchIterator` is changed in apache/spark#38618 (since Spark 3.4)

Before Spark 3.4:

```
private[sql] def toBatchIterator(
    rowIter: Iterator[InternalRow],
    schema: StructType,
    maxRecordsPerBatch: Int,
    timeZoneId: String,
    context: TaskContext): Iterator[Array[Byte]]
```

Spark 3.4

```
private[sql] def toBatchIterator(
    rowIter: Iterator[InternalRow],
    schema: StructType,
    maxRecordsPerBatch: Long,
    timeZoneId: String,
    context: TaskContext): ArrowBatchIterator
```

the return type is changed from `Iterator[Array[Byte]]` to `ArrowBatchIterator`

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4754 from cfmcgrady/arrow-spark34.

Closes #4754

a3c58d0 [Fu Chen] fix ci
32704c5 [Fu Chen] Revert "fix ci"
e32311a [Fu Chen] fix ci
a76af62 [Cheng Pan] Update externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala
453b6a6 [Cheng Pan] Update externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala
74a9f7a [Cheng Pan] Update externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/kyuubi/SparkDatasetHelper.scala
4ce5844 [Fu Chen] adapt Spark 3.4

Lead-authored-by: Fu Chen <cfmcgrady@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit d0a7ca4)
Signed-off-by: Cheng Pan <chengpan@apache.org>
@pan3793
Copy link
Member

pan3793 commented Apr 23, 2023

Thanks, merged to master/1.7

@cfmcgrady cfmcgrady deleted the arrow-spark34 branch April 23, 2023 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants