[SPARK-33278][SQL] Improve the performance for FIRST_VALUE #30178

beliefer · 2020-10-29T03:47:04Z

What changes were proposed in this pull request?

#29800 provides a performance improvement for NTH_VALUE.
FIRST_VALUE also could use the UnboundedOffsetWindowFunctionFrame and UnboundedPrecedingOffsetWindowFunctionFrame.

Why are the changes needed?

Improve the performance for FIRST_VALUE.

Does this PR introduce any user-facing change?

'No'.

How was this patch tested?

Jenkins test.

SparkQA · 2020-10-29T04:43:39Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34996/

SparkQA · 2020-10-29T05:05:06Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34996/

SparkQA · 2020-10-29T07:05:01Z

Test build #130393 has finished for PR 30178 at commit 181186c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2020-10-29T07:06:51Z

retest this please

SparkQA · 2020-11-11T14:34:04Z

Test build #130930 has finished for PR 30178 at commit e296eb6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-11T15:48:57Z

Test build #130933 has finished for PR 30178 at commit f851a4c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class OptimizeWindowFunctionsSuite extends PlanTest

SparkQA · 2020-11-11T16:09:20Z

Test build #130934 has finished for PR 30178 at commit fd7e02e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-11-12T03:40:59Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

+object OptimizeWindowFunctions extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+    case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec)
+      if !spec.orderSpec.isEmpty =>


...st/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala

SparkQA · 2020-11-12T03:42:48Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35565/

SparkQA · 2020-11-12T04:04:05Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35565/

SparkQA · 2020-11-12T05:55:53Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35573/

SparkQA · 2020-11-12T06:28:39Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35573/

cloud-fan · 2020-11-12T07:33:43Z

...st/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala

+    assert(optimized == correctAnswer)
+  }
+
+  test("can't replace first(col) by nth_value(col, 1) if the window frame type is row") {


row -> range

SparkQA · 2020-11-12T07:47:17Z

Test build #130959 has finished for PR 30178 at commit 72ceacc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-12T08:05:02Z

Test build #130976 has finished for PR 30178 at commit 3a7f4e7.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-12T08:05:02Z

Test build #130967 has finished for PR 30178 at commit 68d3388.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2020-11-12T08:08:50Z

retest this please

SparkQA · 2020-11-12T08:29:26Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35583/

SparkQA · 2020-11-12T08:58:22Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35583/

SparkQA · 2020-11-12T09:34:58Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35586/

SparkQA · 2020-11-12T09:57:38Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35586/

SparkQA · 2020-11-12T12:52:44Z

Test build #130980 has finished for PR 30178 at commit 3a7f4e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-11-12T14:59:21Z

thanks, merging to master!

beliefer · 2020-11-13T02:54:04Z

@cloud-fan Thanks for your help!

cloud-fan · 2020-11-18T05:36:57Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

+  def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
+    case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec)
+      if spec.orderSpec.nonEmpty &&
+        spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame].frameType == RowFrame =>


shall we also check if the lower bound is UnboundedPreceding? otherwise we can't use the offset optimization for nth_value and first is probably faster than nth_value(1)

OK. I created the #30419 to make this check.

… transfer first to nth_value ### What changes were proposed in this pull request? #30178 provided `OptimizeWindowFunctions` used to transfer `first` to `nth_value`. If the window frame is `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`, `nth_value` has better performance than `first`. But the `OptimizeWindowFunctions` need to exclude other window frame. ### Why are the changes needed? Improve `OptimizeWindowFunctions` to avoid transfer `first` to `nth_value` if the specified window frame isn't `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Jenkins test. Closes #30419 from beliefer/SPARK-33278_followup. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

… transfer first to nth_value ### What changes were proposed in this pull request? apache/spark#30178 provided `OptimizeWindowFunctions` used to transfer `first` to `nth_value`. If the window frame is `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`, `nth_value` has better performance than `first`. But the `OptimizeWindowFunctions` need to exclude other window frame. ### Why are the changes needed? Improve `OptimizeWindowFunctions` to avoid transfer `first` to `nth_value` if the specified window frame isn't `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Jenkins test. Closes #30419 from beliefer/SPARK-33278_followup. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

beliefer and others added 26 commits June 19, 2020 10:36

Reuse completeNextStageWithFetchFailure

4a6f903

Merge remote-tracking branch 'upstream/master'

96456e2

Merge remote-tracking branch 'upstream/master'

4314005

Merge remote-tracking branch 'upstream/master'

d6af4a7

Merge remote-tracking branch 'upstream/master'

f69094f

Merge remote-tracking branch 'upstream/master'

b86a42d

Merge branch 'master' of github.com:beliefer/spark

2ac5159

Merge remote-tracking branch 'upstream/master'

9021d6c

Merge branch 'master' of github.com:beliefer/spark

74a2ef4

Merge remote-tracking branch 'upstream/master'

9828158

Merge remote-tracking branch 'upstream/master'

9cd1aaf

Merge remote-tracking branch 'upstream/master'

abfcbb9

Merge remote-tracking branch 'upstream/master'

07c6c81

Merge remote-tracking branch 'upstream/master'

580130b

Merge branch 'master' of github.com:beliefer/spark

3712808

Merge remote-tracking branch 'upstream/master'

6107413

Merge remote-tracking branch 'upstream/master'

4b799b4

Merge remote-tracking branch 'upstream/master'

ee0ecbf

Merge remote-tracking branch 'upstream/master'

596bc61

Merge remote-tracking branch 'upstream/master'

0164e2f

Merge remote-tracking branch 'upstream/master'

90b79fc

Merge remote-tracking branch 'upstream/master'

2cef3a9

Merge remote-tracking branch 'upstream/master'

c26b64f

Merge remote-tracking branch 'upstream/master'

2e02cd2

Merge remote-tracking branch 'upstream/master'

a6d0741

Improve the performance for first_value

181186c

Optimize code

72ceacc

cloud-fan reviewed Nov 12, 2020

View reviewed changes

...st/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala Show resolved Hide resolved

Optimize code

68d3388

cloud-fan reviewed Nov 12, 2020

View reviewed changes

Optimize code

3a7f4e7

cloud-fan approved these changes Nov 12, 2020

View reviewed changes

cloud-fan closed this in 2f07c56 Nov 12, 2020

cloud-fan reviewed Nov 18, 2020

View reviewed changes

beliefer mentioned this pull request Nov 19, 2020

[SPARK-33278][SQL][FOLLOWUP] Improve OptimizeWindowFunctions to avoid transfer first to nth_value. #30419

Closed

[SPARK-33278][SQL] Improve the performance for FIRST_VALUE #30178

[SPARK-33278][SQL] Improve the performance for FIRST_VALUE #30178

Uh oh!

Conversation

beliefer commented Oct 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Oct 29, 2020

Uh oh!

SparkQA commented Oct 29, 2020

Uh oh!

SparkQA commented Oct 29, 2020

Uh oh!

beliefer commented Oct 29, 2020

Uh oh!

SparkQA commented Nov 11, 2020

Uh oh!

SparkQA commented Nov 11, 2020

Uh oh!

SparkQA commented Nov 11, 2020

Uh oh!

cloud-fan Nov 12, 2020

Choose a reason for hiding this comment

Uh oh!

beliefer Nov 12, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

cloud-fan Nov 12, 2020

Choose a reason for hiding this comment

Uh oh!

beliefer Nov 12, 2020

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

beliefer commented Nov 12, 2020

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

SparkQA commented Nov 12, 2020

Uh oh!

cloud-fan commented Nov 12, 2020

Uh oh!

beliefer commented Nov 13, 2020

Uh oh!

cloud-fan Nov 18, 2020

Choose a reason for hiding this comment

Uh oh!

beliefer Nov 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

beliefer commented Oct 29, 2020 •

edited

Loading

beliefer Nov 19, 2020 •

edited

Loading