simplify QueryStage #5

cloud-fan · 2019-01-16T16:14:58Z

No description provided.

xuanyuanking · 2019-01-17T14:46:34Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanQueryStage.scala

@@ -37,44 +33,25 @@ import org.apache.spark.sql.types.StructType
 case class PlanQueryStage(conf: SQLConf) extends Rule[SparkPlan] {


conf: SQLConf is no longer needed.

xuanyuanking · 2019-01-17T15:05:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStage.scala

+    cachedRDD
+  }
+
+  def planToRun: SparkPlan = finalPlan


What's the thinking behind planToRun, maybe directly use finalPlan is clean semantics.

xuanyuanking · 2019-01-17T15:11:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStage.scala

-  override def outputPartitioning: Partitioning = child.outputPartitioning
-
-  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+abstract class QueryStage extends LeafExecNode {


I understand the reasons for changing QueryStage from UnaryExecNode to LeafExecNode are:

Getting rid of var in param list.

The original approach of UnaryExecNode didn't override child.

Is that right?

because I removed QueryStageInput

yucai · 2019-01-17T16:23:32Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

  protected def adaptivePreparations: Seq[Rule[SparkPlan]] = Seq(
    PlanSubqueries(sparkSession),
    EnsureRequirements(sparkSession.sessionState.conf),
+    ReuseExchange(sparkSession.sessionState.conf),
    ReuseSubquery(sparkSession.sessionState.conf),
    // PlanQueryStage needs to be the last rule because it divides the plan into multiple sub-trees
    // by inserting leaf node QueryStageInput. Transforming the plan after applying this rule will


QueryStageInput -> QueryStage

yucai · 2019-01-17T16:29:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageReaderExec.scala

+      cachedShuffleRDD
+    }
+  }
+}


Do we still need cachedShuffleRDD? How it be reused?

nvm, if we execute plan twice, the cachedShuffleRDD could be used.

carsonwang · 2019-01-18T05:58:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageTrigger.scala

+      }
+
+    case StageReady(stage) =>
+      stageToParentStage.remove(stage.id).foreach { parentStage =>


A stage being reused can have multiple parent stages. Need decrease all parent stages' numPendingChildStages.

It is also possible when the reused stage is ready, another parent have not yet been triggered, right? So before triggering the child stages, we may also check if any one has already been ready.

carsonwang

Thanks @cloud-fan for the efforts!

carsonwang · 2019-01-18T06:28:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStage.scala

+  override def output: Seq[Attribute] = plan.output
+  override def outputPartitioning: Partitioning = plan.outputPartitioning
+  override def outputOrdering: Seq[SortOrder] = plan.outputOrdering
+  override def executeCollect(): Array[InternalRow] = plan.executeCollect()


Also add executeToIterator?

carsonwang · 2019-01-18T06:28:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlan.scala

+    currentQueryStage
+  }
+
+  override def executeCollect(): Array[InternalRow] = finalPlan.executeCollect()


Also add executeToIterator?

yucai · 2019-01-18T07:57:30Z

新的方法避免了Operator的递归调用，这样实现很赞！

一个open问题，我们是不是可以引入AdaptiveExchange，随着AE的执行，plan和partitioning都会动态地发生变化，这样，即使是一些预先plan好的Exchange都有可能可以在runtime被省去。

select * 
from store_sales join store on (ss_store_sk = s_store_sk)
join item on (ss_item_sk = i_item_sk)
where s_store_name like 'us%'

store_sales和store的join，本来是sort merge join，
因为我们对store做了过滤，很可能导致sort merge join -> broadcast join
broadcast join之后，partitioning是和store_sales一样的，
而如果store_sales已经按ss_item_sk bucket好了，
这样store_sales_join_store join item时store_sales_join_store就不需要shuffle了。

更激进点的，其实store_sales join store的时候store_sales也是不需要shuffle的。

使用AdaptiveExchange的关键在于，这个Exchange不是一定会发生的，它动态地判断目前的partitioning，决定是否做shuffle，在一些情况下可以退化为一个空操作。

这样我们可以从原来的4个shuffle，减少到2个shuffle，并且省去的是大表shuffle。

yucai · 2019-01-20T10:47:05Z

...core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala

-    s"coordinator[target post-shuffle partition size: $advisoryTargetPostShuffleInputSize]"
+case class CoalescedShuffleReaderExec(
+    child: ShuffleQueryStage,
+    partitionStartIndices: Array[Int]) extends LeafExecNode {


Should we make it UnaryExecNode? Otherwise, we cannot see its child in UI.

yucai · 2019-01-21T04:41:26Z

...core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala

+        ThreadUtils.awaitResult(metricsFuture, Duration.Zero)
+    }.filter(_ != null) // ShuffleQueryStage may give null mapOutputStatistics, skip it.
+
+    if (shuffleMetrics.nonEmpty) {


When bucket table joins non-bucket table, it could fail.

For example:

sql("drop table bucketed_table1").collect sql("drop table table2").collect val df1 = (0 until 50).map(i => (i % 5, i % 13, i.toString)).toDF("i", "j", "k").as("df1") val df2 = (0 until 50).map(i => (i % 7, i % 11, i.toString)).toDF("i", "j", "k").as("df2") df1.write.format("parquet").bucketBy(20, "i").saveAsTable("bucketed_table1") df2.write.format("parquet").saveAsTable("table2") spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "0") spark.conf.set("spark.sql.adaptive.maxNumPostShufflePartitions", 10) sql("select * from bucketed_table1 t1 join table2 t2 on t1.i = t2.i").collect()

java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions: List(20, 1) at org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:58) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254) at scala.Option.getOrElse(Option.scala:138) at org.apache.spark.rdd.RDD.partitions(RDD.scala:252) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:254)

yucai · 2019-01-21T04:46:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlan.scala

+ * statistics.
+ */
+case class AdaptiveSparkPlan(resultStage: ResultQueryStage, session: SparkSession)
+  extends LeafExecNode{


Do we need special consideration in SparkPlanInfo.fromSparkPlan?

def fromSparkPlan(plan: SparkPlan): SparkPlanInfo = { val children = plan match { case ReusedExchangeExec(_, child) => child :: Nil case stage: QueryStage => stage.plan :: Nil case adaptive: AdaptiveSparkPlan => adaptive.plan :: Nil case _ => plan.children ++ plan.subqueries }

yes we need, good catch!

cloud-fan · 2019-01-21T05:41:00Z

随着AE的执行，plan和partitioning都会动态地发生变化，这样，即使是一些预先plan好的Exchange都有可能可以在runtime被省去。

好想法！按照现在的framework，还得runtime动态合并query stage，比较tricky。其实深入想一下，根本原因是我们等EnsureRequirements之后才开始分割query stage，导致有些exchange到了后面变得冗余。有个优雅的办法是，我们bottom-up的做planning，当需要shuffle的时候，把当前的subtree包成一个query stage去materialize，然后继续往上planning+create query stage。不过这个改动比较大，我们可以以后再做。

cloud-fan · 2019-01-21T13:56:36Z

...core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala

+    } else {
+      // If not all leaf nodes are shuffle query stages, it's not safe to reduce the number of
+      // shuffle partitions, because we may break the assumption that all children of a spark plan
+      // have same number of output partitions.


This is a bug fix, the test case is https://github.com/carsonwang/spark/pull/5/files#diff-ab139fa5ac5c1a7f4c8bd15970db3567R489

carsonwang · 2019-01-22T09:16:22Z

Thanks @cloud-fan very much. Let's merge this.

do not re-implement exchange reuse

2b04686

cloud-fan force-pushed the help2 branch 4 times, most recently from 3129cd7 to 62709d5 Compare January 17, 2019 07:25

simplify QueryStage

9b64981

cloud-fan force-pushed the help2 branch from 62709d5 to 9b64981 Compare January 17, 2019 07:26

add comments

319783f

xuanyuanking reviewed Jan 17, 2019

View reviewed changes

yucai reviewed Jan 17, 2019

View reviewed changes

cloud-fan added 2 commits January 18, 2019 02:24

new idea

025314f

polish

e24598a

cloud-fan force-pushed the help2 branch from e4bb3e1 to e24598a Compare January 17, 2019 18:56

carsonwang reviewed Jan 18, 2019

View reviewed changes

yucai reviewed Jan 20, 2019

View reviewed changes

yucai reviewed Jan 21, 2019

View reviewed changes

cloud-fan force-pushed the help2 branch from 6b8fb1d to 8c618b3 Compare January 21, 2019 13:50

cloud-fan commented Jan 21, 2019

View reviewed changes

address comments

afa6295

cloud-fan force-pushed the help2 branch from 8c618b3 to afa6295 Compare January 21, 2019 14:02

improve QueryStageTrigger

cf72277

carsonwang merged commit ea93dbf into carsonwang:AE_1 Jan 22, 2019

		@@ -37,44 +33,25 @@ import org.apache.spark.sql.types.StructType
		case class PlanQueryStage(conf: SQLConf) extends Rule[SparkPlan] {

simplify QueryStage #5

simplify QueryStage #5

Uh oh!

Conversation

cloud-fan commented Jan 16, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carsonwang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yucai commented Jan 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yucai Jan 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jan 21, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carsonwang commented Jan 22, 2019

Uh oh!

Uh oh!

yucai commented Jan 18, 2019 •

edited

Loading

yucai Jan 20, 2019 •

edited

Loading