Skip to content

[SPARK-22080][SQL] Adds support for allowing user to add pre-optimization rules #19295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sathiyapk
Copy link
Contributor

@sathiyapk sathiyapk commented Sep 20, 2017

What changes were proposed in this pull request?

Currently, the user provided custom rules that are added via sparkSession.experimental.extraOptimizations = Seq(..) are applied only after all the spark's native rules are applied.

After this PR, users can add custom pre-optimization rules via:
sparkSession.experimental.extraPreOptimizations = Seq(MyCustomPreOptimization)
And custom post-optimization rules via:
sparkSession.experimental.extraOptimizations = Seq(MyCustomPostOptimization)

How was this patch tested?

The changes are unit tested and also locally test using spark shell.

@sathiyapk sathiyapk changed the title SPARK-22080 Adds support for allowing user to add pre-optimization rules [SPARK-22080][SQL] Adds support for allowing user to add pre-optimization rules Sep 20, 2017
"User Provided Post Optimizers", fixedPoint, experimentalMethods.extraOptimizations: _*)

override def batches: Seq[Batch] = experimentalPreOptimizations ++
(preOptimizationBatches ++ super.batches :+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't user just use preOptimizationBatches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wzhfy Thanks for your comment. Yes, i see preOptimizationBatches is introduced since 2.2 but i'm not sure this option allows user to add custom rules during runtime (say, via spark-shell). Could you confirm this? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, there is postHocOptimizationBatches but experimentalMethods.extraOptimizations is used for adding custom optimisation methods..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. Then could you add the use case to PR description? like:

after this PR, we can add both pre/post optimization rules at runtime as follows:
...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not about Analyzer, please also update your description.

"User Provided Post Optimizers", fixedPoint, experimentalMethods.extraOptimizations: _*)

override def batches: Seq[Batch] = experimentalPreOptimizations ++
(preOptimizationBatches ++ super.batches :+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. Then could you add the use case to PR description? like:

after this PR, we can add both pre/post optimization rules at runtime as follows:
...

@@ -44,11 +44,14 @@ class ExperimentalMethods private[sql]() {
*/
@volatile var extraStrategies: Seq[Strategy] = Nil

@volatile var extraPreOptimizations: Seq[Rule[LogicalPlan]] = Nil

@volatile var extraOptimizations: Seq[Rule[LogicalPlan]] = Nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about rename this extraPostOptimizations?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an API change. We can't do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i agree with @gatorsmile, renaming extraOptimizations to extraPostOptimizations will be symmetric with extraPreOptimizations, but doing so may affect the existing API calls.

@@ -28,12 +28,18 @@ class SparkOptimizer(
experimentalMethods: ExperimentalMethods)
extends Optimizer(catalog) {

override def batches: Seq[Batch] = (preOptimizationBatches ++ super.batches :+
val experimentalPreOptimizations: Seq[Batch] = Seq(Batch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also define this as Batch and you can use experimentalPreOptimizations +: preOptimizationBatches to concatenate with other batches.

sqlContext.experimental.extraPreOptimizations = Seq(DummyPreOptimizationRule)

val firstBatch = sqlContext.sessionState.optimizer.batches.head
val lastBatch = sqlContext.sessionState.optimizer.batches.last // .flatMap(_.rules)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the comment useful?

@wzhfy
Copy link
Contributor

wzhfy commented Sep 23, 2017

ok to test

@wzhfy
Copy link
Contributor

wzhfy commented Sep 23, 2017

ping @cloud-fan @gatorsmile

@gatorsmile
Copy link
Member

I do not think we should do it. The extra pre-optimizer rules can easily break our existing optimizer rules. Adding post optimizer rules should be enough for 99% cases.

@sathiyapk
Copy link
Contributor Author

@gatorsmile thanks for your comments. Here are my thoughts, thanks for correcting me if i'm wrong. (sorry for the big comment though :))

  1. This PR don't change any existing API, it adds a new one.
  2. In the usual cases, for the people who don't use ExperimentalMethods, it don't affect or break anything.
  3. For the people who use ExperimentalMethods, irrespective of whether it is pre-optimizer or post-optimizer rule, it will break anyway if they do it wrong.
  4. One of the advantages of this PR sparkSession.experimental.extraPreOptimizations is that the user provided rule can get further optimizer by the native rules of spark, which is not possible with sparkSession.experimental.extraOptimizations. I'm writing a blog post regarding this with an example, i will post the link soon.
  5. Last but not least, one of the main intention of the spark catalyst optimizer, as mentioned in its sigmod paper, is it's simplicity in defining new optimization rules and plug it into the query optimizer during runtime, so we should consider not to limit it even if it only concerns a rare case.

@sathiyapk
Copy link
Contributor Author

I pushed a new commit that addresses @wzhfy review comments..

@gatorsmile
Copy link
Member

Sorry, we do not expect users to add rules before our internal optimizer rules finish, as I explained above. To avoid the potential issues, I suggest to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants