-
Notifications
You must be signed in to change notification settings - Fork 253
feat: Support ANSI mode sum expr #2600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Support ANSI mode sum expr #2600
Conversation
|
Draft PR to support sum function - WIP |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2600 +/- ##
============================================
+ Coverage 56.12% 59.06% +2.93%
- Complexity 976 1476 +500
============================================
Files 119 165 +46
Lines 11743 15060 +3317
Branches 2251 2504 +253
============================================
+ Hits 6591 8895 +2304
- Misses 4012 4900 +888
- Partials 1140 1265 +125 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
970d693 to
c0715aa
Compare
|
@andygrove , @comphead I believe this should be ready for review (pending CI) |
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Outdated
Show resolved
Hide resolved
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Outdated
Show resolved
Hide resolved
Thats actually interesting why would window fall back, we planning to fallback on windows in #2726 but the test would still preserve windows, so I think we need to check a fallback reason |
6dcf47b to
13b7d68
Compare
This is no longer a failing test after I rebased with main branch |
|
@andygrove @comphead . All the tests passed and the PR is rest for review |
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderfender This LGTM. I am going to pull the PR locally and do some testing/benchmarking today. Could you rebease/upmerge?
|
Thank you @andygrove . Let me rebase Ave fix the merge conflicts |
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
Outdated
Show resolved
Hide resolved
|
Here are microbenchmark results: Legacy modeANSI mode |
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Outdated
Show resolved
Hide resolved
|
@andygrove , Please take a look whenever you get a chance . I made changes per review and fixed merge conflicts |
Which issue does this PR close?
Closes #531 .
(Partially closes 531) . The ANSI changes to support
AVGwill be tracked in a new PRRationale for this change
DataFusion's default SUM doesn't match Spark's overflow semantics. This implementation ensures we get the exact same behavior as Spark for integer overflow in all 3 eval modes (ANSI , Legacy and Try mode)
What changes are included in this PR?
This PR adds native Rust implementation for SUM aggregation on integer types (byte, short, int, long)
Native changes (Rust):
(Inspired from
SumDecimaland spark's SUM impl)SumIntegeraggregate function that handles SUM for all integer types (in coherence with Spark)( Implemented code in similar fashion of spark leveraging
Option<i64>to represent NULL and numeric values for sum , and an additional parameter calledhas_all_nullswhich is leveraged in Try mode to distinguish if NULL sum is caused by all NULL inputs or the fact that the sum overflowed. (Spark does this withshouldTrackIsEmptyand assigning NULL to running sum which is a long datatype) )Scala side changes :
CometSumto add ANSI support (ANSI and Try)eval_modeinstead offail_on_errorto supportLegacy, ANSI and Tryeval modesjava.lang.Longto avoid Scala's auto-boxing feature which auto casts objects to primitive types there by casting nulls to 0s) handling in both simple and GROUP BY agg .How are these changes tested?