Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #6661] Improve perf for column-based TRowSet generation #6662

Closed
wants to merge 1 commit into from

Conversation

hh-cn
Copy link

@hh-cn hh-cn commented Sep 3, 2024

🔍 Description

Issue References 🔗

This pull request fixes #6661

Describe Your Solution 🔧

TColumnGenerator.getColumnToList should not access to non-IndexedSeq with Seq.apply(i), which will cause performance reduce, convert it to foreach loop will be good. see https://issues.apache.org/jira/browse/SPARK-47085 for more details.

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests


Checklist 📝

Be nice. Be informative.

@pan3793
Copy link
Member

pan3793 commented Sep 3, 2024

Do you have any statistics to measure the performance improvements? And have you compared the patched version with Spark Thrift Server?

@pan3793 pan3793 changed the title [KYUUBI #6661] improve column-based TRowSet generation [KYUUBI #6661] Improve perf for column-based TRowSet generation Sep 3, 2024
@pan3793
Copy link
Member

pan3793 commented Sep 3, 2024

I see the Spark ticket you attached on the issue, and understand your change now.

The code change LGTM, please fill in the PR description seriously, it's very important for future explorers to understand each patch.

@codecov-commenter
Copy link

codecov-commenter commented Sep 3, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (9533c5a) to head (4597e88).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...apache/kyuubi/engine/result/TColumnGenerator.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #6662   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         683     683           
  Lines       42205   42204    -1     
  Branches     5756    5755    -1     
======================================
+ Misses      42205   42204    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bowenliang123
Copy link
Contributor

bowenliang123 commented Sep 3, 2024

BTW, any benchmark results on the same conditions for comparing looping approaches to support the supposed changes?

@hh-cn
Copy link
Author

hh-cn commented Sep 4, 2024

No benchmark yet

@hh-cn
Copy link
Author

hh-cn commented Sep 4, 2024

@pan3793 the description has been updated, my apologies to miss that. Can this be merged and close it now?

@bowenliang123 bowenliang123 added this to the v1.9.3 milestone Sep 4, 2024
bowenliang123 pushed a commit that referenced this pull request Sep 4, 2024
# 🔍 Description
## Issue References 🔗

This pull request fixes #6661

## Describe Your Solution 🔧

TColumnGenerator.getColumnToList should not access to non-IndexedSeq with Seq.apply(i), which will cause performance reduce, convert it to foreach loop will be good. see https://issues.apache.org/jira/browse/SPARK-47085 for more details.

## Types of changes 🔖

- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklist 📝

- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6662 from hh-cn/KYUUBI-6661.

Closes #6661

4597e88 [hang.huang] improve column-based TRowSet generation

Authored-by: hang.huang <hang.huang@advancegroup.com>
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
(cherry picked from commit 14e07ea)
Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
@bowenliang123
Copy link
Contributor

Thanks, merged to master (1.10.0) and branch-1.9 (1.9.3).

@hh-cn hh-cn deleted the KYUUBI-6661 branch September 5, 2024 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Improve performance on converting spark rows to column-based thrift row set
6 participants