-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KYUUBI #6661] Improve perf for column-based TRowSet generation #6662
Conversation
Do you have any statistics to measure the performance improvements? And have you compared the patched version with Spark Thrift Server? |
I see the Spark ticket you attached on the issue, and understand your change now. The code change LGTM, please fill in the PR description seriously, it's very important for future explorers to understand each patch. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6662 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 683 683
Lines 42205 42204 -1
Branches 5756 5755 -1
======================================
+ Misses 42205 42204 -1 ☔ View full report in Codecov by Sentry. |
BTW, any benchmark results on the same conditions for comparing looping approaches to support the supposed changes? |
No benchmark yet |
@pan3793 the description has been updated, my apologies to miss that. Can this be merged and close it now? |
# 🔍 Description ## Issue References 🔗 This pull request fixes #6661 ## Describe Your Solution 🔧 TColumnGenerator.getColumnToList should not access to non-IndexedSeq with Seq.apply(i), which will cause performance reduce, convert it to foreach loop will be good. see https://issues.apache.org/jira/browse/SPARK-47085 for more details. ## Types of changes 🔖 - [x] Bugfix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to change) ## Test Plan 🧪 #### Behavior Without This Pull Request ⚰️ #### Behavior With This Pull Request 🎉 #### Related Unit Tests --- # Checklist 📝 - [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html) **Be nice. Be informative.** Closes #6662 from hh-cn/KYUUBI-6661. Closes #6661 4597e88 [hang.huang] improve column-based TRowSet generation Authored-by: hang.huang <hang.huang@advancegroup.com> Signed-off-by: Bowen Liang <liangbowen@gf.com.cn> (cherry picked from commit 14e07ea) Signed-off-by: Bowen Liang <liangbowen@gf.com.cn>
Thanks, merged to master (1.10.0) and branch-1.9 (1.9.3). |
🔍 Description
Issue References 🔗
This pull request fixes #6661
Describe Your Solution 🔧
TColumnGenerator.getColumnToList should not access to non-IndexedSeq with Seq.apply(i), which will cause performance reduce, convert it to foreach loop will be good. see https://issues.apache.org/jira/browse/SPARK-47085 for more details.
Types of changes 🔖
Test Plan 🧪
Behavior Without This Pull Request ⚰️
Behavior With This Pull Request 🎉
Related Unit Tests
Checklist 📝
Be nice. Be informative.