Skip to content

[SPARK-33278][SQL] Improve the performance for FIRST_VALUE #30178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 43 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Oct 29, 2020

What changes were proposed in this pull request?

#29800 provides a performance improvement for NTH_VALUE.
FIRST_VALUE also could use the UnboundedOffsetWindowFunctionFrame and UnboundedPrecedingOffsetWindowFunctionFrame.

Why are the changes needed?

Improve the performance for FIRST_VALUE.

Does this PR introduce any user-facing change?

'No'.

How was this patch tested?

Jenkins test.

@SparkQA
Copy link

SparkQA commented Oct 29, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34996/

@SparkQA
Copy link

SparkQA commented Oct 29, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34996/

@SparkQA
Copy link

SparkQA commented Oct 29, 2020

Test build #130393 has finished for PR 30178 at commit 181186c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Test build #130930 has finished for PR 30178 at commit e296eb6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Test build #130933 has finished for PR 30178 at commit f851a4c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class OptimizeWindowFunctionsSuite extends PlanTest

@SparkQA
Copy link

SparkQA commented Nov 11, 2020

Test build #130934 has finished for PR 30178 at commit fd7e02e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

object OptimizeWindowFunctions extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec)
if !spec.orderSpec.isEmpty =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nonEmpty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35565/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35565/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35573/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35573/

assert(optimized == correctAnswer)
}

test("can't replace first(col) by nth_value(col, 1) if the window frame type is row") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

row -> range

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Test build #130959 has finished for PR 30178 at commit 72ceacc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Test build #130976 has finished for PR 30178 at commit 3a7f4e7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Test build #130967 has finished for PR 30178 at commit 68d3388.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35583/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35583/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35586/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35586/

@SparkQA
Copy link

SparkQA commented Nov 12, 2020

Test build #130980 has finished for PR 30178 at commit 3a7f4e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 2f07c56 Nov 12, 2020
@beliefer
Copy link
Contributor Author

@cloud-fan Thanks for your help!

def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
case we @ WindowExpression(AggregateExpression(first: First, _, _, _, _), spec)
if spec.orderSpec.nonEmpty &&
spec.frameSpecification.asInstanceOf[SpecifiedWindowFrame].frameType == RowFrame =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also check if the lower bound is UnboundedPreceding? otherwise we can't use the offset optimization for nth_value and first is probably faster than nth_value(1)

Copy link
Contributor Author

@beliefer beliefer Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I created the #30419 to make this check.

cloud-fan pushed a commit that referenced this pull request Nov 23, 2020
… transfer first to nth_value

### What changes were proposed in this pull request?
#30178 provided `OptimizeWindowFunctions` used to transfer `first` to `nth_value`.
If the window frame is `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`, `nth_value` has better performance than `first`.
But the `OptimizeWindowFunctions` need to exclude other window frame.

### Why are the changes needed?
 Improve `OptimizeWindowFunctions` to avoid transfer `first` to `nth_value` if the specified window frame isn't `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Jenkins test.

Closes #30419 from beliefer/SPARK-33278_followup.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
a0x8o added a commit to a0x8o/spark that referenced this pull request Nov 23, 2020
… transfer first to nth_value

### What changes were proposed in this pull request?
apache/spark#30178 provided `OptimizeWindowFunctions` used to transfer `first` to `nth_value`.
If the window frame is `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`, `nth_value` has better performance than `first`.
But the `OptimizeWindowFunctions` need to exclude other window frame.

### Why are the changes needed?
 Improve `OptimizeWindowFunctions` to avoid transfer `first` to `nth_value` if the specified window frame isn't `UNBOUNDED PRECEDING AND CURRENT ROW` or `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Jenkins test.

Closes #30419 from beliefer/SPARK-33278_followup.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants