-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-33278][SQL] Improve the performance for FIRST_VALUE #30178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #130393 has finished for PR 30178 at commit
|
retest this please |
Test build #130930 has finished for PR 30178 at commit
|
Test build #130933 has finished for PR 30178 at commit
|
Test build #130934 has finished for PR 30178 at commit
|
object OptimizeWindowFunctions extends Rule[LogicalPlan] { | ||
def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { | ||
case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec) | ||
if !spec.orderSpec.isEmpty => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonEmpty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
...st/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status failure |
assert(optimized == correctAnswer) | ||
} | ||
|
||
test("can't replace first(col) by nth_value(col, 1) if the window frame type is row") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
row
-> range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Test build #130959 has finished for PR 30178 at commit
|
Test build #130976 has finished for PR 30178 at commit
|
Test build #130967 has finished for PR 30178 at commit
|
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #130980 has finished for PR 30178 at commit
|
thanks, merging to master! |
@cloud-fan Thanks for your help! |
def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { | ||
case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec) | ||
if spec.orderSpec.nonEmpty && | ||
spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame].frameType == RowFrame => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also check if the lower bound is UnboundedPreceding
? otherwise we can't use the offset optimization for nth_value and first
is probably faster than nth_value(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I created the #30419 to make this check.
… transfer first to nth_value ### What changes were proposed in this pull request? #30178 provided `OptimizeWindowFunctions` used to transfer `first` to `nth_value`. If the window frame is `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`, `nth_value` has better performance than `first`. But the `OptimizeWindowFunctions` need to exclude other window frame. ### Why are the changes needed? Improve `OptimizeWindowFunctions` to avoid transfer `first` to `nth_value` if the specified window frame isn't `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Jenkins test. Closes #30419 from beliefer/SPARK-33278_followup. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… transfer first to nth_value ### What changes were proposed in this pull request? apache/spark#30178 provided `OptimizeWindowFunctions` used to transfer `first` to `nth_value`. If the window frame is `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`, `nth_value` has better performance than `first`. But the `OptimizeWindowFunctions` need to exclude other window frame. ### Why are the changes needed? Improve `OptimizeWindowFunctions` to avoid transfer `first` to `nth_value` if the specified window frame isn't `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Jenkins test. Closes #30419 from beliefer/SPARK-33278_followup. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
#29800 provides a performance improvement for
NTH_VALUE
.FIRST_VALUE
also could use theUnboundedOffsetWindowFunctionFrame
andUnboundedPrecedingOffsetWindowFunctionFrame
.Why are the changes needed?
Improve the performance for
FIRST_VALUE
.Does this PR introduce any user-facing change?
'No'.
How was this patch tested?
Jenkins test.