-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32934][SQL] Improve the performance for NTH_VALUE and reactor the OffsetWindowFunction #29800
Conversation
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test status failure |
Test build #130322 has finished for PR 29800 at commit
|
Test build #130324 has finished for PR 29800 at commit
|
Test build #130326 has finished for PR 29800 at commit
|
Test build #130325 has finished for PR 29800 at commit
|
* 1. [[FrameLessOffsetWindowFunction]] returns the value of the input column offset by a number | ||
* of rows according to the current row. | ||
* 2. [[UnboundedOffsetWindowFunctionFrame]] and [[UnboundedPrecedingOffsetWindowFunctionFrame]] | ||
* returns the value of the input column offset by a number of rows within the partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
within the partition
-> within the frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
if (inputIterator.hasNext) inputIterator.next() | ||
inputIndex += 1 | ||
} | ||
if (inputIndex >= 0 && inputIndex < input.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputIndex >= 0
seems always true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
-- !query output | ||
Larry Bott 11798 NULL | ||
Gerard Bondur 11472 Gerard Bondur | ||
Pamela Castillo 11303 Gerard Bondur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR. We should fix the test framework so that the result is always aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output comes from hiveResultString
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #130348 has finished for PR 29800 at commit
|
thanks, merging to master! |
@cloud-fan Thanks for your help! |
### What changes were proposed in this pull request? #29800 provides a performance improvement for `NTH_VALUE`. `FIRST_VALUE` also could use the `UnboundedOffsetWindowFunctionFrame` and `UnboundedPrecedingOffsetWindowFunctionFrame`. ### Why are the changes needed? Improve the performance for `FIRST_VALUE`. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Jenkins test. Closes #30178 from beliefer/SPARK-33278. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Co-authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Spark SQL supports some window function like
NTH_VALUE
.If we specify window frame like
UNBOUNDED PRECEDING AND CURRENT ROW
orUNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
, we can elimate some calculations.For example: if we execute the SQL show below:
The output for row number greater than 1, return the fixed value. otherwise, return null. So we just calculate the value once and notice whether the row number less than 2.
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
is simpler.Why are the changes needed?
Improve the performance for
NTH_VALUE
,FIRST_VALUE
andLAST_VALUE
.Does this PR introduce any user-facing change?
'No'.
How was this patch tested?
Jenkins test.