-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-22719][SQL]Refactor ConstantPropagation #19912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #84566 has finished for PR 19912 at commit
|
(None, Seq(((right, left), e))) | ||
case a: And => | ||
val (newLeft, equalityPredicatesLeft) = traverse(a.left, false) | ||
val (newRight, equalityPredicatesRight) = traverse(a.right, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaceChildren = false
} | ||
/** | ||
* Traverse a condition as a tree and replace attributes with constant values. | ||
* - If the child of [[And]] is [[EqualTo]] or [[EqualNullSafe]], propagate the mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the child
-> any child
} else { | ||
None | ||
} | ||
(newSelf, Seq.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val (newChild, _) = traverse(n.child, replaceChildren = true)
(newChild.map(Not), Seq.empty)
(newSelf, equalityPredicates) | ||
case o: Or => | ||
val (newLeft, _) = traverse(o.left, true) | ||
val (newRight, _) = traverse(o.right, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here.
} | ||
(newSelf, equalityPredicates) | ||
case o: Or => | ||
val (newLeft, _) = traverse(o.left, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments to explains why we do not need to consider the returned equalityPredicates
} | ||
(newSelf, Seq.empty) | ||
case n: Not => | ||
val (newChild, _) = traverse(n.child, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here. add a comment.
Overall, it looks good to me. cc @jiangxb1987 @cloud-fan |
* attributes with the corresponding constant values in both children with propagated mapping. | ||
* @param condition condition to be traversed | ||
* @param replaceChildren whether to replace attributes with the corresponding constant values | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add doc to explain return value?
* - If the child of [[And]] is [[EqualTo]] or [[EqualNullSafe]], propagate the mapping | ||
* of attribute => constant. | ||
* - If the current [[And]] node is not child of another [[And]], replace occurrence of the | ||
* attributes with the corresponding constant values in both children with propagated mapping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add the comments to explain Or
and Not
processing below.
def apply(plan: LogicalPlan): LogicalPlan = plan transform { | ||
case f: Filter => f transformExpressionsUp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be the same effect if we turn transformExpressionsUp
to transformExpressionsDown
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this doesn't work because it still replaces the replaced And
.
New algorithm looks correct and good. |
* @param replaceChildren whether to replace attributes with the corresponding constant values | ||
*/ | ||
private def traverse(condition: Expression, replaceChildren: Boolean) | ||
: (Option[Expression], Seq[((AttributeReference, Literal), BinaryComparison)]) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe define a type alias for Seq[((AttributeReference, Literal), BinaryComparison)]
.
@gatorsmile @viirya thanks for reviewing! I have addressed your comments. |
Test build #84593 has finished for PR 19912 at commit
|
retest this please |
Test build #84596 has finished for PR 19912 at commit
|
retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test build #84598 has finished for PR 19912 at commit
|
LGTM |
: Expression = { | ||
val constantsMap = AttributeMap(equalityPredicates.map(_._1)) | ||
val predicates = equalityPredicates.map(_._2).toSet | ||
def _replaceConstants(expression: Expression) = expression transform { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: replaceConstants0
is more java style
val predicates = equalityPredicates.map(_._2).toSet | ||
def _replaceConstants(expression: Expression) = expression transform { | ||
case a: AttributeReference => | ||
constantsMap.get(a) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constentsMap.getOrElse(a, a)
LGTM except 2 minor style comments |
Test build #84605 has finished for PR 19912 at commit
|
Thanks! Merged to master. |
What changes were proposed in this pull request?
The current time complexity of ConstantPropagation is O(n^2), which can be slow when the query is complex.
Refactor the implementation with O( n ) time complexity, and some pruning to avoid traversing the whole
Condition
How was this patch tested?
Unit test.
Also simple benchmark test in ConstantPropagationSuite
Run time before changes: 18989ms (474ms per loop)
Run time after changes: 1275 ms (32ms per loop)