[SPARK-28356][SQL] Do not reduce the number of partitions for repartition in adaptive execution #25121

carsonwang · 2019-07-11T22:34:01Z

What changes were proposed in this pull request?

Adaptive execution reduces the number of post-shuffle partitions at runtime, even for shuffles caused by repartition. However, the user likely wants to get the desired number of partition when he calls repartition even in adaptive execution. This PR adds an internal config to control this and by default adaptive execution will not change the number of post-shuffle partition for repartition.

How was this patch tested?

New tests added.

carsonwang · 2019-07-11T22:43:14Z

cc @cloud-fan , @gczsjdy , @justinuang

SparkQA · 2019-07-12T01:54:12Z

Test build #107555 has finished for PR 25121 at commit 19e86bf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-07-12T02:55:43Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

@@ -312,6 +312,16 @@ object SQLConf {
    .booleanConf
    .createWithDefault(true)

+  val REDUCE_POST_SHUFFLE_PARTITIONS_FOR_REPARTITION =


I probably won't add this config. If users call Dataset#repartition, we have to respect it and not change it.

I initially didn't add this config. Later I found with that change, the test("Union two datasets with different pre-shuffle partition number") won't test what we want to test, because it was written using repartition. Probably I can rewrite that test without using repartition.

I think we can remove that test. Excluding repartition, the pre-shuffle num partitions are always the same (200 by default).

We do need to have a test for repartition though.

cloud-fan · 2019-07-12T02:57:22Z

sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala

@@ -43,7 +43,8 @@ import org.apache.spark.util.collection.unsafe.sort.{PrefixComparators, RecordCo
 */
 case class ShuffleExchangeExec(
    override val outputPartitioning: Partitioning,
-    child: SparkPlan) extends Exchange {
+    child: SparkPlan,
+    supportAdaptive: Boolean = true) extends Exchange {


if we remove the config, I'd probably call this canChangeNumPartition

gczsjdy · 2019-07-12T03:58:16Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala

@@ -140,6 +140,8 @@ case class ShuffleQueryStageExec(
      case _ =>
    }
  }
+
+  def supportAdaptive: Boolean = plan.supportAdaptive


:nit -> exchangeSupportAdaptive? Or canChangeNumPartition
QueryStage is already a part of adaptive execution

carsonwang · 2019-07-12T17:50:10Z

@cloud-fan @gczsjdy , updated based on the comments. Thanks!

SparkQA · 2019-07-12T21:11:59Z

Test build #107613 has finished for PR 25121 at commit 3a2717e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-07-15T03:17:14Z

sql/core/src/test/scala/org/apache/spark/sql/execution/ReduceNumShufflePartitionsSuite.scala


      checkAnswer(resultDf,
-        Seq((0), (0), (1), (1), (2), (2)).map(i => Row(i)))
+        Seq((0), (1), (2)).map(i => Row(i)))


this is just Seq(0, 1, 2), right?

True. Good catch.

cloud-fan · 2019-07-15T03:23:11Z

...core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala

+      case stage: ShuffleQueryStageExec => stage
+      case ReusedQueryStageExec(_, stage: ShuffleQueryStageExec, _) => stage
+    }
+    if (!shuffleStages.forall(_.canChangeNumPartition)) {


add a comment to explain it?

cloud-fan · 2019-07-15T03:23:55Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala

@@ -140,6 +140,8 @@ case class ShuffleQueryStageExec(
      case _ =>
    }
  }
+
+  def canChangeNumPartition: Boolean = plan.canChangeNumPartition


do we need this? It only saves typing 4 characters....

Removed it.

gczsjdy · 2019-07-15T08:59:19Z

LGTM

SparkQA · 2019-07-15T22:05:00Z

Test build #107696 has finished for PR 25121 at commit ba1bda8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-07-16T09:36:05Z

thanks, merging to master!

…tion in adaptive execution ## What changes were proposed in this pull request? Adaptive execution reduces the number of post-shuffle partitions at runtime, even for shuffles caused by repartition. However, the user likely wants to get the desired number of partition when he calls repartition even in adaptive execution. This PR adds an internal config to control this and by default adaptive execution will not change the number of post-shuffle partition for repartition. ## How was this patch tested? New tests added. Closes apache#25121 from carsonwang/AE_repartition. Authored-by: Carson Wang <carson.wang@intel.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

peter-toth · 2019-08-13T14:58:27Z

...core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala

-        validMetrics.map(stats => stats.bytesByPartitionId.length).distinct
-
-      if (validMetrics.nonEmpty && distinctNumPreShufflePartitions.length == 1) {
+      if (validMetrics.nonEmpty) {


@carsonwang is it safe to remove distinctNumPreShufflePartitions.length == 1 from here? I think @hvanhovell's comment (https://github.com/apache/spark/pull/24978/files#r299396944) still applies here about Union. I run into an issue with my plan:

Union :- Project [id_key#236, true AS row_type#249, link#232] : +- Filter (isnotnull(min(id_key) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#246) AND (id_key#236 = min(id_key) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#246)) : +- Window [min(id_key#236) windowspecdefinition(specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS min(id_key) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#246] : +- ShuffleQueryStage 5 : +- Exchange SinglePartition, true : +- *(7) Project [id_key#236, link#232] : +- *(7) SortMergeJoin [link#237], [link#232], Inner, (id_key#236 > id_key#230) : :- *(5) Sort [link#237 ASC NULLS FIRST], false, 0 : : +- CoalescedShuffleReader [0] : : +- ShuffleQueryStage 0 : : +- Exchange hashpartitioning(link#237, 5), true : : +- *(1) Project [col1#224 AS id_key#236, col2#225 AS link#237] : : +- *(1) LocalTableScan [col1#224, col2#225] : +- *(6) Sort [link#232 ASC NULLS FIRST], false, 0 : +- CoalescedShuffleReader [0] : +- ShuffleQueryStage 1 : +- Exchange hashpartitioning(link#232, 5), true : +- *(2) Project [id_key#230, link#232] : +- *(2) Filter (isnotnull(link#232) AND isnotnull(id_key#230)) : +- *(2) Scan RecursiveReference iter[id_key#230,row_type#231,link#232] +- Project [id_key#240, new AS new#256, link#241] +- SortMergeJoin [id_key#238], [id_key#240], Inner :- Sort [id_key#238 ASC NULLS FIRST], false, 0 : +- ShuffleQueryStage 4 : +- Exchange hashpartitioning(id_key#238, 5), true : +- *(4) Project [id_key#238] : +- *(4) Filter (isnotnull(min(id_key) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#247) AND (id_key#238 = min(id_key) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#247)) : +- Window [min(id_key#238) windowspecdefinition(specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS min(id_key) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)#247] : +- CoalescedShuffleReader [0] : +- ShuffleQueryStage 2 : +- Exchange SinglePartition, true : +- LocalTableScan <empty>, [id_key#238] +- Sort [id_key#240 ASC NULLS FIRST], false, 0 +- ShuffleQueryStage 3 +- Exchange hashpartitioning(id_key#240, 5), true +- *(3) Project [col1#228 AS id_key#240, col2#229 AS link#241] +- *(3) LocalTableScan [col1#228, col2#229]

where ShuffleQueryStage 5 conflicts with ShuffleQueryStage 4 and ShuffleQueryStage 3.

ah SinglePartition is an exception. So it's still possible to hit distinctNumPreShufflePartitions.length > 1 here. Let's add back this check @carsonwang @maryannxue

thanks for reporting it!

I've opened a small PR here: #25479 as a follow-up, please let me know if this requires a new ticket.

…tion in adaptive execution ## What changes were proposed in this pull request? Adaptive execution reduces the number of post-shuffle partitions at runtime, even for shuffles caused by repartition. However, the user likely wants to get the desired number of partition when he calls repartition even in adaptive execution. This PR adds an internal config to control this and by default adaptive execution will not change the number of post-shuffle partition for repartition. ## How was this patch tested? New tests added. Closes apache#25121 from carsonwang/AE_repartition. Authored-by: Carson Wang <carson.wang@intel.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

carsonwang added 2 commits July 12, 2019 06:10

Do not reduce number of partitions for repartition

a725dce

update style

19e86bf

cloud-fan reviewed Jul 12, 2019

View reviewed changes

dongjoon-hyun added the SQL label Jul 12, 2019

gczsjdy reviewed Jul 12, 2019

View reviewed changes

address comments

3a2717e

cloud-fan reviewed Jul 15, 2019

View reviewed changes

address comments

ba1bda8

cloud-fan closed this in d1a1376 Jul 16, 2019

peter-toth reviewed Aug 13, 2019

View reviewed changes

peter-toth mentioned this pull request Aug 16, 2019

[SPARK-28356][SHUFFLE][FOLLOWUP] Fix case with different pre-shuffle partition numbers #25479

Closed

[SPARK-28356][SQL] Do not reduce the number of partitions for repartition in adaptive execution #25121

[SPARK-28356][SQL] Do not reduce the number of partitions for repartition in adaptive execution #25121

Uh oh!

Conversation

carsonwang commented Jul 11, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

carsonwang commented Jul 11, 2019

Uh oh!

SparkQA commented Jul 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gczsjdy Jul 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carsonwang commented Jul 12, 2019

Uh oh!

SparkQA commented Jul 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gczsjdy commented Jul 15, 2019

Uh oh!

SparkQA commented Jul 15, 2019

Uh oh!

cloud-fan commented Jul 16, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gczsjdy Jul 12, 2019 •

edited

Loading