[SPARK-29227][SS] Track rule info in optimization phase #25914

wenxuanguan · 2019-09-24T08:47:15Z

What changes were proposed in this pull request?

Track timing info for each rule in optimization phase using QueryPlanningTracker in Structured Streaming

Why are the changes needed?

In Structured Streaming we only track rule info in analysis phase, not in optimization phase.

Does this PR introduce any user-facing change?

No

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java

gaborgsomogyi · 2019-09-25T13:19:21Z

Just for my own understanding where such tracking info can be seen after it's collected?

wenxuanguan · 2019-09-25T15:17:08Z

Just for my own understanding where such tracking info can be seen after it's collected?

@gaborgsomogyi Thanks for your reply.
Follow by jira SPARK-26221, it is improvements for better metrics and instrumentation. And the reporting functions can be used for self-test at present.

HeartSaVioR

I see what's missing here.

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala

Lines 91 to 110 in a1b90bf

    
             /** 
        
              * Executes the batches of rules defined by the subclass, and also tracks timing info for each 
        
              * rule using the provided tracker. 
        
              * @see [[execute]] 
        
              */ 
        
             def executeAndTrack(plan: TreeType, tracker: QueryPlanningTracker): TreeType = { 
        
               QueryPlanningTracker.withTracker(tracker) { 
        
                 execute(plan) 
        
               } 
        
             } 
        
             /** 
        
              * Executes the batches of rules defined by the subclass. The batches are executed serially 
        
              * using the defined execution strategy. Within each batch, rules are also executed serially. 
        
              */ 
        
             def execute(plan: TreeType): TreeType = { 
        
               var curPlan = plan 
        
               val queryExecutionMetrics = RuleExecutor.queryExecutionMeter 
        
               val planChangeLogger = new PlanChangeLogger() 
        
               val tracker: Option[QueryPlanningTracker] = QueryPlanningTracker.get

For now elapsed time for each rule would not be measured in streaming query, and this patch fixes it.

In batch query it does correctly, see below:

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

Lines 77 to 81 in a1b90bf

    
           lazy val optimizedPlan: LogicalPlan = tracker.measurePhase(QueryPlanningTracker.OPTIMIZATION) { 
        
             // clone the plan to avoid sharing the plan instance between different stages like analyzing, 
        
             // optimizing and planning. 
        
             sparkSession.sessionState.optimizer.executeAndTrack(withCachedData.clone(), tracker) 
        
           }

LGTM

gaborgsomogyi

LGTM. Checked a query output manually.
Haven't had a super deep look but I think it can be unit tested.

wenxuanguan · 2019-09-28T01:06:33Z

@HyukjinKwon @dongjoon-hyun Is there any thing I can do to move on, Thanks.

HyukjinKwon · 2019-10-11T01:37:10Z

ok to test

HyukjinKwon

Yes, can we add a test? (see #23096)

wenxuanguan · 2019-10-21T13:52:50Z

Add commit for UT.
The only change is in QueryPlanningTrackerEndToEndSuite.

SparkQA · 2019-10-21T17:26:07Z

Test build #112394 has finished for PR 25914 at commit 060b329.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class QueryPlanningTrackerEndToEndSuite extends StreamTest

sql/core/src/test/scala/org/apache/spark/sql/execution/QueryPlanningTrackerEndToEndSuite.scala

SparkQA · 2019-10-22T07:05:02Z

Test build #112433 has finished for PR 25914 at commit 41fddba.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gaborgsomogyi · 2019-10-22T07:55:39Z

retest this please

SparkQA · 2019-10-22T11:33:44Z

Test build #112450 has finished for PR 25914 at commit 41fddba.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR

LGTM again

wenxuanguan · 2019-10-23T14:46:04Z

merge master to redo check, nothing changed.

SparkQA · 2019-10-23T15:50:18Z

Test build #112541 has finished for PR 25914 at commit 81b891c.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
skeleton_class = type_constructor(name, bases, type_kwargs)
enum_class = metacls.__new__(metacls, name, bases, classdict)
abstract class AbstractSqlParser(conf: SQLConf) extends ParserInterface with Logging
class CatalystSqlParser(conf: SQLConf) extends AbstractSqlParser(conf)
case class CreateNamespaceStatement(
case class TruncateTableStatement(
case class ShowPartitionsStatement(tableName: Seq[String],
case class CreateNamespace(
class SparkSqlParser(conf: SQLConf) extends AbstractSqlParser(conf)
case class CreateNamespaceExec(

sql/core/src/test/scala/org/apache/spark/sql/execution/QueryPlanningTrackerEndToEndSuite.scala

HyukjinKwon · 2019-10-25T01:01:58Z

Merged to master.

wenxuanguan · 2019-10-25T07:31:31Z

@HeartSaVioR @gaborgsomogyi @HyukjinKwon @dongjoon-hyun Thank you all for review and merge.

gatorsmile · 2019-11-05T05:32:19Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala

  lazy val optimizedPlan: LogicalPlan = tracker.measurePhase(QueryPlanningTracker.OPTIMIZATION) {
-    sparkSession.sessionState.optimizer.execute(withCachedData) transformAllExpressions {
+    sparkSession.sessionState.optimizer.executeAndTrack(withCachedData,
+      tracker) transformAllExpressions {


nitpit: in most cases, we do not break the function call in the middle of parameter lists. We can change it to

val sessionState = sparkSession.sessionState sessionState.optimizer.executeAndTrack(withCachedData, tracker).transformAllExpressions {

or

sparkSession.sessionState.optimizer .executeAndTrack(withCachedData, tracker).transformAllExpressions {

wenxuanguan changed the title ~~[SPARK-29227][SS]track rule info in optimization phase~~ [SPARK-29227][SS]Track rule info in optimization phase Sep 24, 2019

HyukjinKwon reviewed Sep 24, 2019

View reviewed changes

sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java Outdated Show resolved Hide resolved

dongjoon-hyun added the STRUCTURED STREAMING label Sep 24, 2019

HeartSaVioR approved these changes Sep 25, 2019

View reviewed changes

gaborgsomogyi approved these changes Sep 26, 2019

View reviewed changes

wenxuanguan requested a review from HyukjinKwon September 28, 2019 01:31

HyukjinKwon reviewed Oct 11, 2019

View reviewed changes

wenxuanguan added 3 commits October 21, 2019 21:44

track rule info in optimization phase

53547b2

revert doc change

8a33aba

add ut

060b329

wenxuanguan force-pushed the spark-29227 branch from 19d988b to 060b329 Compare October 21, 2019 13:49

HyukjinKwon reviewed Oct 22, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/QueryPlanningTrackerEndToEndSuite.scala Outdated Show resolved Hide resolved

HyukjinKwon approved these changes Oct 22, 2019

View reviewed changes

add jira info in test name

41fddba

HeartSaVioR approved these changes Oct 22, 2019

View reviewed changes

Merge branch 'github-master' into spark-29227

81b891c

dongjoon-hyun changed the title ~~[SPARK-29227][SS]Track rule info in optimization phase~~ [SPARK-29227][SS] Track rule info in optimization phase Oct 24, 2019

dongjoon-hyun reviewed Oct 24, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/QueryPlanningTrackerEndToEndSuite.scala Show resolved Hide resolved

HyukjinKwon closed this in 40df9d2 Oct 25, 2019

gatorsmile reviewed Nov 5, 2019

View reviewed changes

	/**
	* Executes the batches of rules defined by the subclass, and also tracks timing info for each
	* rule using the provided tracker.
	* @see [[execute]]
	*/
	def executeAndTrack(plan: TreeType, tracker: QueryPlanningTracker): TreeType = {
	QueryPlanningTracker.withTracker(tracker) {
	execute(plan)
	}
	}

	/**
	* Executes the batches of rules defined by the subclass. The batches are executed serially
	* using the defined execution strategy. Within each batch, rules are also executed serially.
	*/
	def execute(plan: TreeType): TreeType = {
	var curPlan = plan
	val queryExecutionMetrics = RuleExecutor.queryExecutionMeter
	val planChangeLogger = new PlanChangeLogger()
	val tracker: Option[QueryPlanningTracker] = QueryPlanningTracker.get

	lazy val optimizedPlan: LogicalPlan = tracker.measurePhase(QueryPlanningTracker.OPTIMIZATION) {
	// clone the plan to avoid sharing the plan instance between different stages like analyzing,
	// optimizing and planning.
	sparkSession.sessionState.optimizer.executeAndTrack(withCachedData.clone(), tracker)
	}

[SPARK-29227][SS] Track rule info in optimization phase #25914

[SPARK-29227][SS] Track rule info in optimization phase #25914

Uh oh!

Conversation

wenxuanguan commented Sep 24, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Uh oh!

Uh oh!

gaborgsomogyi commented Sep 25, 2019

Uh oh!

wenxuanguan commented Sep 25, 2019

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

gaborgsomogyi left a comment

Choose a reason for hiding this comment

Uh oh!

wenxuanguan commented Sep 28, 2019

Uh oh!

HyukjinKwon commented Oct 11, 2019

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

wenxuanguan commented Oct 21, 2019

Uh oh!

SparkQA commented Oct 21, 2019

Uh oh!

Uh oh!

SparkQA commented Oct 22, 2019

Uh oh!

gaborgsomogyi commented Oct 22, 2019

Uh oh!

SparkQA commented Oct 22, 2019

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

wenxuanguan commented Oct 23, 2019

Uh oh!

SparkQA commented Oct 23, 2019

Uh oh!

Uh oh!

HyukjinKwon commented Oct 25, 2019

Uh oh!

wenxuanguan commented Oct 25, 2019

Uh oh!

gatorsmile Nov 5, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants