[SPARK-22080][SQL] Adds support for allowing user to add pre-optimization rules #19295

sathiyapk · 2017-09-20T15:04:16Z

What changes were proposed in this pull request?

Currently, the user provided custom rules that are added via sparkSession.experimental.extraOptimizations = Seq(..) are applied only after all the spark's native rules are applied.

After this PR, users can add custom pre-optimization rules via:
sparkSession.experimental.extraPreOptimizations = Seq(MyCustomPreOptimization)
And custom post-optimization rules via:
sparkSession.experimental.extraOptimizations = Seq(MyCustomPostOptimization)

How was this patch tested?

The changes are unit tested and also locally test using spark shell.

wzhfy · 2017-09-22T08:45:36Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala

+    "User Provided Post Optimizers", fixedPoint, experimentalMethods.extraOptimizations: _*)
+
+  override def batches: Seq[Batch] = experimentalPreOptimizations ++
+    (preOptimizationBatches ++ super.batches :+


why can't user just use preOptimizationBatches?

@wzhfy Thanks for your comment. Yes, i see preOptimizationBatches is introduced since 2.2 but i'm not sure this option allows user to add custom rules during runtime (say, via spark-shell). Could you confirm this? Thanks.

For example, there is postHocOptimizationBatches but experimentalMethods.extraOptimizations is used for adding custom optimisation methods..

OK, I see. Then could you add the use case to PR description? like:

after this PR, we can add both pre/post optimization rules at runtime as follows: ...

This PR is not about Analyzer, please also update your description.

wzhfy · 2017-09-23T02:36:27Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala

+    "User Provided Post Optimizers", fixedPoint, experimentalMethods.extraOptimizations: _*)
+
+  override def batches: Seq[Batch] = experimentalPreOptimizations ++
+    (preOptimizationBatches ++ super.batches :+


OK, I see. Then could you add the use case to PR description? like:

after this PR, we can add both pre/post optimization rules at runtime as follows: ...

wzhfy · 2017-09-23T02:36:55Z

sql/core/src/main/scala/org/apache/spark/sql/ExperimentalMethods.scala

@@ -44,11 +44,14 @@ class ExperimentalMethods private[sql]() {
   */
  @volatile var extraStrategies: Seq[Strategy] = Nil

+  @volatile var extraPreOptimizations: Seq[Rule[LogicalPlan]] = Nil
+
  @volatile var extraOptimizations: Seq[Rule[LogicalPlan]] = Nil


how about rename this extraPostOptimizations?

This is an API change. We can't do it.

Yes, i agree with @gatorsmile, renaming extraOptimizations to extraPostOptimizations will be symmetric with extraPreOptimizations, but doing so may affect the existing API calls.

wzhfy · 2017-09-23T02:40:16Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala

@@ -28,12 +28,18 @@ class SparkOptimizer(
    experimentalMethods: ExperimentalMethods)
  extends Optimizer(catalog) {

-  override def batches: Seq[Batch] = (preOptimizationBatches ++ super.batches :+
+  val experimentalPreOptimizations: Seq[Batch] = Seq(Batch(


also define this as Batch and you can use experimentalPreOptimizations +: preOptimizationBatches to concatenate with other batches.

wzhfy · 2017-09-23T02:41:42Z

sql/core/src/test/scala/org/apache/spark/sql/SQLContextSuite.scala

+    sqlContext.experimental.extraPreOptimizations = Seq(DummyPreOptimizationRule)
+
+    val firstBatch = sqlContext.sessionState.optimizer.batches.head
+    val lastBatch = sqlContext.sessionState.optimizer.batches.last // .flatMap(_.rules)


is the comment useful?

wzhfy · 2017-09-23T03:03:55Z

ok to test

wzhfy · 2017-09-23T03:10:39Z

ping @cloud-fan @gatorsmile

gatorsmile · 2017-09-23T18:07:53Z

I do not think we should do it. The extra pre-optimizer rules can easily break our existing optimizer rules. Adding post optimizer rules should be enough for 99% cases.

sathiyapk · 2017-09-24T10:04:32Z

@gatorsmile thanks for your comments. Here are my thoughts, thanks for correcting me if i'm wrong. (sorry for the big comment though :))

This PR don't change any existing API, it adds a new one.
In the usual cases, for the people who don't use ExperimentalMethods, it don't affect or break anything.
For the people who use ExperimentalMethods, irrespective of whether it is pre-optimizer or post-optimizer rule, it will break anyway if they do it wrong.
One of the advantages of this PR sparkSession.experimental.extraPreOptimizations is that the user provided rule can get further optimizer by the native rules of spark, which is not possible with sparkSession.experimental.extraOptimizations. I'm writing a blog post regarding this with an example, i will post the link soon.
Last but not least, one of the main intention of the spark catalyst optimizer, as mentioned in its sigmod paper, is it's simplicity in defining new optimization rules and plug it into the query optimizer during runtime, so we should consider not to limit it even if it only concerns a rare case.

sathiyapk · 2017-09-24T11:57:50Z

I pushed a new commit that addresses @wzhfy review comments..

gatorsmile · 2017-09-25T17:59:16Z

Sorry, we do not expect users to add rules before our internal optimizer rules finish, as I explained above. To avoid the potential issues, I suggest to close it.

SPARK-22080 Adds support for allowing user to add pre-optimization rules

abd6c04

sathiyapk changed the title ~~SPARK-22080 Adds support for allowing user to add pre-optimization rules~~ [SPARK-22080][SQL] Adds support for allowing user to add pre-optimization rules Sep 20, 2017

wzhfy reviewed Sep 22, 2017

View reviewed changes

wzhfy reviewed Sep 23, 2017

View reviewed changes

SPARK-22080 Addresses review comments

68c7a32

HyukjinKwon mentioned this pull request Sep 26, 2017

[BUILD] Close stale PRs #19348

Closed

asfgit closed this in ceaec93 Sep 27, 2017

[SPARK-22080][SQL] Adds support for allowing user to add pre-optimization rules #19295

[SPARK-22080][SQL] Adds support for allowing user to add pre-optimization rules #19295

Uh oh!

Conversation

sathiyapk commented Sep 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wzhfy commented Sep 23, 2017

Uh oh!

wzhfy commented Sep 23, 2017

Uh oh!

gatorsmile commented Sep 23, 2017

Uh oh!

sathiyapk commented Sep 24, 2017

Uh oh!

sathiyapk commented Sep 24, 2017

Uh oh!

gatorsmile commented Sep 25, 2017

Uh oh!

Uh oh!

sathiyapk commented Sep 20, 2017 •

edited

Loading