-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-22080][SQL] Adds support for allowing user to add pre-optimization rules #19295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22080][SQL] Adds support for allowing user to add pre-optimization rules #19295
Conversation
"User Provided Post Optimizers", fixedPoint, experimentalMethods.extraOptimizations: _*) | ||
|
||
override def batches: Seq[Batch] = experimentalPreOptimizations ++ | ||
(preOptimizationBatches ++ super.batches :+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't user just use preOptimizationBatches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wzhfy Thanks for your comment. Yes, i see preOptimizationBatches is introduced since 2.2 but i'm not sure this option allows user to add custom rules during runtime (say, via spark-shell). Could you confirm this? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, there is postHocOptimizationBatches
but experimentalMethods.extraOptimizations
is used for adding custom optimisation methods..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see. Then could you add the use case to PR description? like:
after this PR, we can add both pre/post optimization rules at runtime as follows:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is not about Analyzer, please also update your description.
"User Provided Post Optimizers", fixedPoint, experimentalMethods.extraOptimizations: _*) | ||
|
||
override def batches: Seq[Batch] = experimentalPreOptimizations ++ | ||
(preOptimizationBatches ++ super.batches :+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see. Then could you add the use case to PR description? like:
after this PR, we can add both pre/post optimization rules at runtime as follows:
...
@@ -44,11 +44,14 @@ class ExperimentalMethods private[sql]() { | |||
*/ | |||
@volatile var extraStrategies: Seq[Strategy] = Nil | |||
|
|||
@volatile var extraPreOptimizations: Seq[Rule[LogicalPlan]] = Nil | |||
|
|||
@volatile var extraOptimizations: Seq[Rule[LogicalPlan]] = Nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about rename this extraPostOptimizations
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an API change. We can't do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i agree with @gatorsmile, renaming extraOptimizations
to extraPostOptimizations
will be symmetric with extraPreOptimizations
, but doing so may affect the existing API calls.
@@ -28,12 +28,18 @@ class SparkOptimizer( | |||
experimentalMethods: ExperimentalMethods) | |||
extends Optimizer(catalog) { | |||
|
|||
override def batches: Seq[Batch] = (preOptimizationBatches ++ super.batches :+ | |||
val experimentalPreOptimizations: Seq[Batch] = Seq(Batch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also define this as Batch
and you can use experimentalPreOptimizations +: preOptimizationBatches
to concatenate with other batches.
sqlContext.experimental.extraPreOptimizations = Seq(DummyPreOptimizationRule) | ||
|
||
val firstBatch = sqlContext.sessionState.optimizer.batches.head | ||
val lastBatch = sqlContext.sessionState.optimizer.batches.last // .flatMap(_.rules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the comment useful?
ok to test |
ping @cloud-fan @gatorsmile |
I do not think we should do it. The extra pre-optimizer rules can easily break our existing optimizer rules. Adding post optimizer rules should be enough for 99% cases. |
@gatorsmile thanks for your comments. Here are my thoughts, thanks for correcting me if i'm wrong. (sorry for the big comment though :))
|
I pushed a new commit that addresses @wzhfy review comments.. |
Sorry, we do not expect users to add rules before our internal optimizer rules finish, as I explained above. To avoid the potential issues, I suggest to close it. |
What changes were proposed in this pull request?
Currently, the user provided custom rules that are added via
sparkSession.experimental.extraOptimizations = Seq(..)
are applied only after all the spark's native rules are applied.After this PR, users can add custom pre-optimization rules via:
sparkSession.experimental.extraPreOptimizations = Seq(MyCustomPreOptimization)
And custom post-optimization rules via:
sparkSession.experimental.extraOptimizations = Seq(MyCustomPostOptimization)
How was this patch tested?
The changes are unit tested and also locally test using spark shell.