[SPARK-37455][SQL] Replace hash with sort aggregate if child is already sorted #34702

c21 · 2021-11-24T23:43:49Z

What changes were proposed in this pull request?

In the query plan, if the child of hash aggregate is already sorted on group-by columns, we can replace hash aggregate with sort aggregate for better performance, as sort aggregate does not have hashing overhead of hash aggregate. Add a physical plan rule ReplaceHashWithSortAgg here, and can be disabled by config spark.sql.execution.replaceHashWithSortAgg.

In addition, to help review as this PR triggers several TPCDS plan files change. The files below are having the real code change:

SQLConf.scala
QueryExecution.scala
ReplaceHashWithSortAgg.scala
AdaptiveSparkPlanExec.scala
HashAggregateExec.scala
ReplaceHashWithSortAggSuite.scala
SQLMetricsSuite.scala

Why are the changes needed?

To get better query performance by leveraging sort ordering in query plan.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit test in ReplaceHashWithSortAggSuite.scala.

c21 · 2021-11-24T23:44:41Z

cc @cloud-fan could you help take a look when you have time? Thanks!

SparkQA · 2021-11-25T00:48:48Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50066/

SparkQA · 2021-11-25T01:47:09Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50066/

SparkQA · 2021-11-25T02:09:38Z

Test build #145594 has finished for PR 34702 at commit 6448864.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-11-25T12:47:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

@@ -423,6 +423,9 @@ object QueryExecution {
      PlanSubqueries(sparkSession),
      RemoveRedundantProjects,
      EnsureRequirements(),
+      // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to guarantee the


Is it because the planner is top-down so we don't know the child ordering during planning? Then we have to add a new rule to change the agg algorithm in a post-hoc way.

Yes it is. If we change our planning to bottom-up and propagate each node output ordering info during planning, then we can run this rule during planning. For now, we have to add it after EnsureRequirements.

cloud-fan · 2021-11-25T13:12:53Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala

+            if (SortOrder.orderingSatisfies(
+                partialAgg.child.outputOrdering, sortAgg.requiredChildOrdering.head)) {
+              sortAgg.copy(
+                aggregateExpressions = sortAgg.aggregateExpressions.map(_.copy(mode = Complete)),


is it always right? I think we also need to check the output partitioning to see if we can eliminate the partial agg.

An example is df.sortWithinPartitions. It does not cluster the data, just sort it within each partition.

@cloud-fan - I don't think we need to check output partitioning, as we are matching a pair of final and partial hash agg, without shuffle in between:

HashAggregate(final) | SortAggregate(complete) HashAggregate(partial) => | | child child

So child must already have proper output partitioning for SortAggregate, o.w. it cannot satisfy original HashAggregate(final)'s required distribution.

ah ok, if there is a shuffle in the middle, we can't optimize? This looks quite limited, as having a shuffle in the middle is very common.

if there is a shuffle in the middle, we can't optimize?

We can, and the rule here also does pattern matching for single HashAggregate below. I added a unit test case in ReplaceHashWithSortAggSuite.scala to demonstrate replacing partial aggregate - "replace partial hash aggregate with sort aggregate". But I think it would be rare to be able to replace final aggregate (though this rule also covers it), as final aggregate is almostly always immediately after a shuffle, so there's no sort ordering before final aggregate.

Spark native shuffle does not guarantee any sort orders, for Cosco (a remote shuffle service we are running in-house), we support sorted shuffle, so final aggregate can also be possible to replace.

SparkQA · 2021-11-26T05:55:01Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50120/

SparkQA · 2021-11-26T06:19:41Z

Test build #145649 has finished for PR 34702 at commit a683137.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-26T06:48:27Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50120/

c21 · 2021-11-27T01:11:17Z

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala

@@ -694,7 +694,8 @@ class SQLMetricsSuite extends SharedSparkSession with SQLMetricsTestUtils
  }

  test("SPARK-25497: LIMIT within whole stage codegen should not consume all the inputs") {
-    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true") {
+    withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "true",
+      SQLConf.REPLACE_HASH_WITH_SORT_AGG_ENABLED.key -> "false") {


This unit test is testing multiple limit operators and hash aggregate operators in one single stage. Disable the rule here because sort aggregate does not support code-gen now, and will break the test logic.

SparkQA · 2021-11-27T01:51:19Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50142/

SparkQA · 2021-11-27T02:33:31Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50142/

SparkQA · 2021-11-27T06:03:27Z

Test build #145672 has finished for PR 34702 at commit e8609fd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

c21 · 2021-11-29T17:20:31Z

This PR is ready for review again thanks @cloud-fan.

tanelk · 2021-12-01T13:25:07Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala

+  /**
+   * Check if `partialAgg` to be partial aggregate of `finalAgg`.
+   */
+  private def isPartialAgg(partialAgg: HashAggregateExec, finalAgg: HashAggregateExec): Boolean = {


This looks like reverse enginering the AggUtils. Could we just link the partial and final agg when they are constructed?

@tanelk - yeah I agree this is mostly reverse engineering and we can do a better job here. I tried link partial and final agg in AggUtils and check linked physical plan to be same or not. This does not quite work due to we are doing top-down planning, and the linked partial agg not being same as planned partial agg (having PlanLater operator in linked partial agg).

I found a more elegant way to do it, by checking the linked logical plan of both aggs to be same. Updated.

cc @cloud-fan for review, thanks.

SparkQA · 2021-12-02T00:13:46Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50300/

SparkQA · 2021-12-02T01:00:15Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50300/

SparkQA · 2021-12-02T04:11:35Z

Test build #145825 has finished for PR 34702 at commit cff1424.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-12-02T14:04:57Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala

+ *
+ *  HashAggregate(t1.i, SUM, final)
+ *               |                         SortAggregate(t1.i, SUM, complete)
+ * HashAggregate(t1.i, SUM, partial)   =>                |


This seems like an orthogonal optimization: we can merge adjacent partial and final aggregates (no shuffle between them) into one complete aggregate.

Yeah I think we can add a rule later to optimize it. I vaguely remember someone proposed this in OSS before but seems impact is not high.

cloud-fan · 2021-12-02T14:09:41Z

The code change LGTM. Since there are quite some TPCDS queries that get plan changes, can we run a TPCDS benchmark to verify performance improvement?

c21 · 2021-12-03T03:17:57Z

Since there are quite some TPCDS queries that get plan changes, can we run a TPCDS benchmark to verify performance improvement?

@cloud-fan - sure. Today I ran the TPCDS benchmark (sf=1) on one AWS r3.xlarge (same as #26049). I don't see much performance difference compared enabling and disabling this rule:

Then I tried with sf=5, but the benchmark has task failure with no space left on device, so the benchmark cannot be conducted on single machine.

Do you recommend disabling this rule by default? After adding sort aggregate code-gen, we can do more large scale testing to enable it. WDYT?

c21 · 2021-12-03T03:40:12Z

Discussed offline with @cloud-fan, we decide to disable the rule by default now. After adding sort aggregate code-gen, a large-scale TPCDS benchmark can be done later.

SparkQA · 2021-12-03T04:50:34Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50355/

SparkQA · 2021-12-03T05:52:46Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50355/

SparkQA · 2021-12-03T09:38:43Z

Test build #145880 has finished for PR 34702 at commit 8ce7d27.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-12-03T13:17:23Z

thanks, merging to master!

c21 · 2021-12-03T21:58:26Z

Thank you @cloud-fan and @tanelk for review!

cloud-fan · 2021-12-06T07:00:32Z

sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala

+   */
+  private def replaceHashAgg(plan: SparkPlan): SparkPlan = {
+    plan.transformDown {
+      case hashAgg: HashAggregateExec if hashAgg.groupingExpressions.nonEmpty =>


BTW, shall we handle ObjectHashAggregateExec as well?

@cloud-fan - yeah I agree. Don't see a problem why we cannot do it. Created https://issues.apache.org/jira/browse/SPARK-37557 for followup. Will do it shortly, thanks.

…s already sorted ### What changes were proposed in this pull request? This is a follow up of #34702 (comment) , where we can replace object hash aggregate with sort aggregate as well. This PR is to handle object hash aggregate. ### Why are the changes needed? Increase coverage of rule by handling object hash aggregate as well. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Modified unit test in `ReplaceHashWithSortAggSuite.scala` to cover object hash aggregate (by using aggregate expression `COLLECT_LIST`). Closes #34824 from c21/agg-rule-followup. Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

Replace hash agg with sort agg if child is already sorted

6448864

github-actions bot added the SQL label Nov 24, 2021

cloud-fan reviewed Nov 25, 2021

View reviewed changes

Do not replace hash agg if grouping expression is empty

a683137

Fix unit test failure

e8609fd

c21 commented Nov 27, 2021

View reviewed changes

tanelk reviewed Dec 1, 2021

View reviewed changes

Check the approach to check partial agg based on logical plan instead

cff1424

cloud-fan reviewed Dec 2, 2021

View reviewed changes

Disable rule by default and back out TPCDS plan change

8ce7d27

cloud-fan approved these changes Dec 3, 2021

View reviewed changes

cloud-fan closed this in 544865d Dec 3, 2021

c21 deleted the agg-rule branch December 3, 2021 21:58

cloud-fan reviewed Dec 6, 2021

View reviewed changes

c21 mentioned this pull request Dec 7, 2021

[SPARK-37557][SQL] Replace object hash with sort aggregate if child is already sorted #34824

Closed

jerqi mentioned this pull request Mar 18, 2025

[#1750] feat(remote merge): Support Spark. apache/uniffle#2405

Open

[SPARK-37455][SQL] Replace hash with sort aggregate if child is already sorted #34702

[SPARK-37455][SQL] Replace hash with sort aggregate if child is already sorted #34702

Uh oh!

Conversation

c21 commented Nov 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

c21 commented Nov 24, 2021

Uh oh!

SparkQA commented Nov 25, 2021

Uh oh!

SparkQA commented Nov 25, 2021

Uh oh!

SparkQA commented Nov 25, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 26, 2021

Uh oh!

SparkQA commented Nov 26, 2021

Uh oh!

SparkQA commented Nov 26, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 27, 2021

Uh oh!

SparkQA commented Nov 27, 2021

Uh oh!

SparkQA commented Nov 27, 2021

Uh oh!

c21 commented Nov 29, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 2, 2021

Uh oh!

SparkQA commented Dec 2, 2021

Uh oh!

SparkQA commented Dec 2, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Dec 2, 2021

Uh oh!

c21 commented Dec 3, 2021

Uh oh!

c21 commented Dec 3, 2021

Uh oh!

SparkQA commented Dec 3, 2021

Uh oh!

SparkQA commented Dec 3, 2021

Uh oh!

SparkQA commented Dec 3, 2021

Uh oh!

cloud-fan commented Dec 3, 2021

Uh oh!

c21 commented Dec 3, 2021

Uh oh!

Choose a reason for hiding this comment

c21 commented Nov 24, 2021 •

edited

Loading