[SPARK-12978][SQL] Skip unnecessary final group-by when input data already clustered with group-by keys #10896

maropu · 2016-01-25T08:14:25Z

This ticket targets the optimization to skip an unnecessary group-by operation below;

Without opt.:

== Physical Plan ==
TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Final,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)], output=[col0#159,sum(col1)#177,avg(col2)#178])
+- TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Partial,isDistinct=false),(avg(col2#161),mode=Partial,isDistinct=false)], output=[col0#159,sum#200,sum#201,count#202L])
   +- TungstenExchange hashpartitioning(col0#159,200), None
      +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], InMemoryRelation [col0#159,col1#160,col2#161], true, 10000, StorageLevel(true, true, false, true, 1), ConvertToUnsafe, None

With opt.:

== Physical Plan ==
TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Complete,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)], output=[col0#159,sum(col1)#177,avg(col2)#178])
+- TungstenExchange hashpartitioning(col0#159,200), None
  +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], InMemoryRelation [col0#159,col1#160,col2#161], true, 10000, StorageLevel(true, true, false, true, 1), ConvertToUnsafe, None

SparkQA · 2016-01-25T08:49:04Z

Test build #49985 has finished for PR 10896 at commit 5ab19c1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-25T13:26:10Z

Test build #49988 has finished for PR 10896 at commit 1b7e3d8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2016-02-02T04:47:46Z

@marmbrus Could you review this and give me suggestions?

marmbrus · 2016-02-02T18:35:44Z

@yhuai would be better to review this, but neither of those plans look great to me. Why are we not partial aggregating before a shuffle? Seems like that will ship a lot of data around for no reason.

yhuai · 2016-02-03T04:44:47Z

I guess that exchange is added because there is a distribute by?

maropu · 2016-02-03T10:04:23Z

Yes, it is. The input query is;

val fields = Seq(StringType, DoubleType, DoubleType)
  .zipWithIndex.map { case (dataType, index) =>
    StructField(s"col$index", dataType, true)
  }

val df = sqlContext.createDataFrame(rdd, StructType(fields))
val df2 = df.repartition($"col0").cache
val df3 = df2.groupBy($"col0").agg(Map("col1"->"sum", "col2"->"avg"))
df3.explain(true)

marmbrus · 2016-02-03T18:15:27Z

Okay, but that code doesn't actually produce an exchange right? Since its captured by the cache?

== Optimized Logical Plan ==
Aggregate [col0#38918], [col0#38918,(sum(cast(col1#38919 as bigint)),mode=Complete,isDistinct=false) AS sum(col1)#38936L,(avg(cast(col2#38920 as bigint)),mode=Complete,isDistinct=false) AS avg(col2)#38937]
+- InMemoryRelation [col0#38918,col1#38919,col2#38920], true, 10000, StorageLevel(true, true, false, true, 1), Exchange hashpartitioning(col0#38918,200), None, None

== Physical Plan ==
WholeStageCodegen
:  +- TungstenAggregate(key=[col0#38918], functions=[(sum(cast(col1#38919 as bigint)),mode=Final,isDistinct=false),(avg(cast(col2#38920 as bigint)),mode=Final,isDistinct=false)], output=[col0#38918,sum(col1)#38936L,avg(col2)#38937])
:     +- TungstenAggregate(key=[col0#38918], functions=[(sum(cast(col1#38919 as bigint)),mode=Partial,isDistinct=false),(avg(cast(col2#38920 as bigint)),mode=Partial,isDistinct=false)], output=[col0#38918,sum#38959L,sum#38960,count#38961L])
:        +- INPUT
+- InMemoryColumnarTableScan [col0#38918,col1#38919,col2#38920], InMemoryRelation [col0#38918,col1#38919,col2#38920], true, 10000, StorageLevel(true, true, false, true, 1), Exchange hashpartitioning(col0#38918,200), None, None

Eitherway, I'll let @yhuai sign off on this.

marmbrus · 2016-02-03T18:16:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala

@@ -86,20 +86,40 @@ object Utils {
      aggregateExpressions: Seq[AggregateExpression],
      aggregateFunctionToAttribute: Map[(AggregateFunction, Boolean), Attribute],
      resultExpressions: Seq[NamedExpression],
+      skipUnnecessaryAggregate: Boolean,


I think it would be clearer if you called this partialAggregation. Its not unnecessary, its an optimization in most cases.

maropu · 2016-02-04T04:03:26Z

Ah, yes..., the code produces no exchange because of cache.

maropu · 2016-02-04T04:17:37Z

As @marmbrus said, we also need push down partial aggregation under an exchange;
The current Catalyst transforms

df.repartition($"col0").groupBy($"col0").agg(Map("col1"->"sum", "col2"->"avg")).explain(true)

into

== Physical Plan ==
TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Final,isDistinct=false),(avg(col2#161),mode=Final,isDistinct=false)], output=[col0#159,sum(col1)#177,avg(col2)#178])
+- TungstenAggregate(key=[col0#159], functions=[(sum(col1#160),mode=Partial,isDistinct=false),(avg(col2#161),mode=Partial,isDistinct=false)], output=[col0#159,sum#200,sum#201,count#202L])
   +- TungstenExchange hashpartitioning(col0#159,200), None
      +- InMemoryColumnarTableScan [col0#159,col1#160,col2#161], InMemoryRelation [col0#159,col1#160,col2#161], true, 10000, StorageLevel(true, true, false, true, 1), ConvertToUnsafe, None

maropu · 2016-02-05T07:38:12Z

@yhuai ping

SparkQA · 2016-02-05T09:17:19Z

Test build #50812 has finished for PR 10896 at commit 140da25.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2016-02-09T09:37:26Z

@yhuai ping

yhuai · 2016-02-10T02:10:26Z

I probably will not be able to take a close look on this PR until later this month. I have a question regarding the approach of PR. Right now, we always plan partial aggregation operators first (in SparkStrategies) and then add Exchange operators (in EnsureRequirements). Another approach will be that we do not add partial aggregation operators in SparkStrategies. Then, after we figure out where we need exchange operators, we add partial aggregation operators. This approach probably needs more code changes. But, I feel it is a more cleaner approach.

@maropu @marmbrus what do you think?

maropu · 2016-02-15T15:12:28Z

@yhuai The second approach's good to me though, I'm not exactly sure how to remove unnecessary final aggregation covered in this pr. IMO these kinds of partial aggregation optimization seem to be similar to Filter optmization such as push-downs and pruning. So, it'd be better to get together these kinds of optimization in a same file. Even in the above example, we should push down the partial aggregation below TungstenExchange generated by DataFrame#repartition.

SparkQA · 2016-04-25T06:08:57Z

Test build #56883 has finished for PR 10896 at commit 9d77c90.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-25T07:40:35Z

Test build #56884 has finished for PR 10896 at commit dcc51a1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-05-20T01:18:53Z

cc @hvanhovell can you review this?

maropu · 2016-05-25T01:51:50Z

@hvanhovell ping

hvanhovell · 2016-05-25T09:47:13Z

@maropu I'll take a look today. Is the description up-to-date?

maropu · 2016-05-25T10:00:52Z

@hvanhovell yeah, it is up-to-dated.

hvanhovell · 2016-05-30T20:10:17Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala

@@ -81,20 +81,38 @@ object Utils {
      groupingExpressions: Seq[NamedExpression],
      aggregateExpressions: Seq[AggregateExpression],
      resultExpressions: Seq[NamedExpression],
+      partialAggregation: Boolean,


Partial aggregation IMO implies that we add a partial aggregation step. What do you think?

hvanhovell · 2016-05-30T20:14:18Z

@maropu IIUC this is still the old approach instead of the approach @yhuai suggests. Do you feel up to see if his approach works? We could also do this in a follow-up.

I have left some minor comments, but in all this looks pretty good.

hvanhovell · 2016-05-30T20:17:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

@@ -257,10 +257,19 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
                planLater(child))
            }
          } else if (functionsWithDistinct.isEmpty) {
+            // Check if the child operator satisfies the group-by distribution requirements


Why not move this code block into aggregate.Utils.planAggregateWithoutDistinct?

maropu · 2016-05-31T12:39:24Z

Thank for you comments! I'll check them in a few days.

hvanhovell · 2016-08-23T15:50:40Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala

+
+  def unapply(plan: SparkPlan): Option[Distribution] = plan match {
+    case agg: AggregateExec
+        if agg.aggregateExpressions.map(_.aggregateFunction).forall(_.supportsPartial) =>


Put this in a function. This can be found a few times in the code.

maropu · 2016-08-24T01:10:32Z

okay, done

SparkQA · 2016-08-24T01:17:02Z

Test build #64322 has finished for PR 10896 at commit 8a81e23.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-24T03:32:46Z

Test build #64324 has finished for PR 10896 at commit d5e0ed3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-25T03:30:32Z

Test build #64388 has finished for PR 10896 at commit ac68145.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2016-08-25T08:50:13Z

@hvanhovell could you also give me comments on #13852?

hvanhovell · 2016-08-25T10:39:21Z

LGTM - merging to master. Thanks!

cloud-fan · 2016-08-30T07:51:40Z

After this PR, we create the partial aggregate operator in EnsureRequirements, which makes the aggregation code harder to understand and also mess up EnsureRequirements.

I have a simpler idea: add a new rule which is run after EnsureRequirements. In this rule, we can combine adjacent partial aggregate and final aggregate into one.

cc @maropu @hvanhovell

cloud-fan · 2016-08-30T08:34:53Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggUtils.scala

   *  - StateStoreRestore (now there is 1 tuple from this batch + optionally one from the previous)
   *  - PartialMerge (now there is at most 1 tuple per group)
   *  - StateStoreSave (saves the tuple for the next batch)
   *  - Complete (output the current result of the aggregation)
+   *
+   *  If the first aggregation needs a shuffle to satisfy its distribution, a map-side partial
+   *  an aggregation and a shuffle are added in `EnsureRequirements`.
   */
  def planStreamingAggregation(


have we tested the streaming aggregation with the optimization?

Yes, it is a bit risky to touch this part.

liancheng · 2016-08-30T08:37:47Z

+1 for @cloud-fan's proposal. Instead of creating a performant plan using tricky code, it's clearer to create a naive but correct physical plan first and then optimize it.

hvanhovell · 2016-08-30T09:36:51Z

You could also argue the other way around, planning a partial aggregate is also a premature optimization, and that the planning of such an Aggregate could also be considered tricky code. BTW: the solution implemented in this PR was initially proposed by @yhuai.

I do think things could be simplified even more, and that either pruning an unneeded partial aggregate or planning one in a new rule both have merit.

maropu · 2016-08-30T09:55:49Z

@cloud-fan @liancheng yea, adding a new rule after EnsureRequirements sounds good to me. One question; creating a partial aggregation in the planner and removing it in the new rule seem to be kind of wasteful, so is it a bad idea to create a partial aggregation in the rule? Physical plans are always correct with/without partial aggregations and it is one of optimizations.

cloud-fan · 2016-08-30T11:26:04Z

I agree that partial aggregate is also kind of optimization, and it's tricky to put it in planner. I think it makes sense to clean it up, after we have sufficient discussion and come to a consensus, but not finishing it within an optimization.

For this particular optimization, I think it's much simpler to add an extra rule to merge the partial and final aggregate, than spreading the aggregation stuff to EnsureRequirements.

cc @yhuai too

maropu · 2016-08-30T11:41:03Z

Sorry for my bad explanation. yes, I agree that we remove the aggregation stuff from EnsureRequirements for simpler codes. I'd say, how about moving the aggregation stuff (creating partial aggregations) into the extra rule after EnsureRequirements?

yhuai · 2016-08-30T23:37:06Z

@maropu Thank you for working on this. Sorry that I did not get time to look at it after you updated the pr. I looked at it today. I think this optimization deserves a feature flag since it determines if we can generate a valid physical plan. We can enable it by default. But, we will have the flexibility to disable it when there is an issue.

After looking at the code, I am not sure it is a good approach to put the logic of adding partial aggregate operators in EnsureRequirements. Originally, I thought we could have a individual rule to add partial aggregate operators and then either extract logic in EnsureRequirements as a utility function or run EnsureRequirements again.

Also, due to the complexity of the logic for planning aggregations, seems after the change it is hard to track the planner logic.

So, seems it will be good to try your your original proposal by adding a rule to remove unnecessary operators (like @cloud-fan implemented in #14876). In this way, it will also be very easy to add the feature flag and keep the optimization rule in a single place. Later, we can revisit this approach if we can clean up the planner logic for aggregation. What do you think?

maropu · 2016-08-31T10:54:52Z

@yhuai Thanks your comment and I agree with you. We'll keep the discussion.

## What changes were proposed in this pull request? according to the discussion in the original PR #10896 and the new approach PR #14876 , we decided to revert these 2 PRs and go with the new approach. ## How was this patch tested? N/A Author: Wenchen Fan <wenchen@databricks.com> Closes #14909 from cloud-fan/revert.

maropu force-pushed the SkipGroupbySpike branch from 5ab19c1 to 1b7e3d8 Compare January 25, 2016 11:30

marmbrus reviewed Feb 3, 2016
View reviewed changes

maropu force-pushed the SkipGroupbySpike branch from 140da25 to 9d77c90 Compare April 25, 2016 06:06

hvanhovell reviewed May 30, 2016
View reviewed changes

maropu force-pushed the SkipGroupbySpike branch 2 times, most recently from 2b1bea6 to 36553bc Compare June 7, 2016 08:58

hvanhovell reviewed Aug 23, 2016
View reviewed changes

Add a new function to check if it supports partial agg

d5e0ed3

maropu force-pushed the SkipGroupbySpike branch from 8a81e23 to d5e0ed3 Compare August 24, 2016 01:38

Apply comments

ac68145

asfgit closed this in 2b0cc4e Aug 25, 2016

cloud-fan reviewed Aug 30, 2016
View reviewed changes

liancheng mentioned this pull request Aug 30, 2016

[SPARK-17289][SQL] Fix a bug to satisfy sort requirements in partial aggregations #14865

Closed

cloud-fan mentioned this pull request Aug 30, 2016

showcase, DO NOT MERGE #14876

Closed

cloud-fan mentioned this pull request Sep 1, 2016

revert PR#10896 and PR#14865 #14909

Closed

maropu deleted the SkipGroupbySpike branch July 5, 2017 11:49

maropu mentioned this pull request Nov 19, 2020

[SPARK-33486][SQL] Collapse Partial and Final physical aggregation nodes together whenever possible #30426

Closed

[SPARK-12978][SQL] Skip unnecessary final group-by when input data already clustered with group-by keys #10896

[SPARK-12978][SQL] Skip unnecessary final group-by when input data already clustered with group-by keys #10896

Uh oh!

Conversation

maropu commented Jan 25, 2016

Uh oh!

SparkQA commented Jan 25, 2016

Uh oh!

SparkQA commented Jan 25, 2016

Uh oh!

maropu commented Feb 2, 2016

Uh oh!

marmbrus commented Feb 2, 2016

Uh oh!

yhuai commented Feb 3, 2016

Uh oh!

maropu commented Feb 3, 2016

Uh oh!

marmbrus commented Feb 3, 2016

Uh oh!

marmbrus Feb 3, 2016

Choose a reason for hiding this comment

Uh oh!

maropu commented Feb 4, 2016

Uh oh!

maropu commented Feb 4, 2016

Uh oh!

maropu commented Feb 5, 2016

Uh oh!

SparkQA commented Feb 5, 2016

Uh oh!

maropu commented Feb 9, 2016

Uh oh!

yhuai commented Feb 10, 2016

Uh oh!

maropu commented Feb 15, 2016

Uh oh!

SparkQA commented Apr 25, 2016

Uh oh!

SparkQA commented Apr 25, 2016

Uh oh!

rxin commented May 20, 2016

Uh oh!

maropu commented May 25, 2016

Uh oh!

hvanhovell commented May 25, 2016

Uh oh!

maropu commented May 25, 2016

Uh oh!

hvanhovell May 30, 2016

Choose a reason for hiding this comment

Uh oh!

hvanhovell commented May 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hvanhovell May 30, 2016

Choose a reason for hiding this comment

Uh oh!

maropu commented May 31, 2016

Uh oh!

hvanhovell Aug 23, 2016

Choose a reason for hiding this comment

Uh oh!

maropu Aug 24, 2016

Choose a reason for hiding this comment

Uh oh!

maropu commented Aug 24, 2016

Uh oh!

SparkQA commented Aug 24, 2016

Uh oh!

SparkQA commented Aug 24, 2016

Uh oh!

SparkQA commented Aug 25, 2016

Uh oh!

maropu commented Aug 25, 2016

Uh oh!

hvanhovell commented Aug 25, 2016

Uh oh!

cloud-fan commented Aug 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

hvanhovell commented May 30, 2016 •

edited

Loading

cloud-fan commented Aug 30, 2016 •

edited

Loading

maropu commented Aug 30, 2016 •

edited

Loading