Skip to content

[SPARK-22719][SQL]Refactor ConstantPropagation #19912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Dec 6, 2017

What changes were proposed in this pull request?

The current time complexity of ConstantPropagation is O(n^2), which can be slow when the query is complex.
Refactor the implementation with O( n ) time complexity, and some pruning to avoid traversing the whole Condition

How was this patch tested?

Unit test.

Also simple benchmark test in ConstantPropagationSuite

  val condition = (1 to 500).map{_ => Rand(0) === Rand(0)}.reduce(And)
  val query = testRelation
    .select(columnA)
    .where(condition)
  val start = System.currentTimeMillis()
  (1 to 40).foreach { _ =>
    Optimize.execute(query.analyze)
  }
  val end = System.currentTimeMillis()
  println(end - start)

Run time before changes: 18989ms (474ms per loop)
Run time after changes: 1275 ms (32ms per loop)

@SparkQA
Copy link

SparkQA commented Dec 6, 2017

Test build #84566 has finished for PR 19912 at commit 962faab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

(None, Seq(((right, left), e)))
case a: And =>
val (newLeft, equalityPredicatesLeft) = traverse(a.left, false)
val (newRight, equalityPredicatesRight) = traverse(a.right, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaceChildren = false

}
/**
* Traverse a condition as a tree and replace attributes with constant values.
* - If the child of [[And]] is [[EqualTo]] or [[EqualNullSafe]], propagate the mapping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the child -> any child

} else {
None
}
(newSelf, Seq.empty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        val (newChild, _) = traverse(n.child, replaceChildren = true)
        (newChild.map(Not), Seq.empty)

(newSelf, equalityPredicates)
case o: Or =>
val (newLeft, _) = traverse(o.left, true)
val (newRight, _) = traverse(o.right, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here.

}
(newSelf, equalityPredicates)
case o: Or =>
val (newLeft, _) = traverse(o.left, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments to explains why we do not need to consider the returned equalityPredicates

}
(newSelf, Seq.empty)
case n: Not =>
val (newChild, _) = traverse(n.child, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here. add a comment.

@gatorsmile
Copy link
Member

Overall, it looks good to me. cc @jiangxb1987 @cloud-fan

* attributes with the corresponding constant values in both children with propagated mapping.
* @param condition condition to be traversed
* @param replaceChildren whether to replace attributes with the corresponding constant values
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doc to explain return value?

* - If the child of [[And]] is [[EqualTo]] or [[EqualNullSafe]], propagate the mapping
* of attribute => constant.
* - If the current [[And]] node is not child of another [[And]], replace occurrence of the
* attributes with the corresponding constant values in both children with propagated mapping.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add the comments to explain Or and Not processing below.

def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case f: Filter => f transformExpressionsUp {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be the same effect if we turn transformExpressionsUp to transformExpressionsDown?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this doesn't work because it still replaces the replaced And.

@viirya
Copy link
Member

viirya commented Dec 7, 2017

New algorithm looks correct and good.

* @param replaceChildren whether to replace attributes with the corresponding constant values
*/
private def traverse(condition: Expression, replaceChildren: Boolean)
: (Option[Expression], Seq[((AttributeReference, Literal), BinaryComparison)]) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe define a type alias for Seq[((AttributeReference, Literal), BinaryComparison)].

@gengliangwang
Copy link
Member Author

@gatorsmile @viirya thanks for reviewing! I have addressed your comments.

@SparkQA
Copy link

SparkQA commented Dec 7, 2017

Test build #84593 has finished for PR 19912 at commit 88cd025.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Dec 7, 2017

Test build #84596 has finished for PR 19912 at commit 88cd025.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Dec 7, 2017

retest this please.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Dec 7, 2017

Test build #84598 has finished for PR 19912 at commit 88cd025.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Dec 7, 2017

LGTM

: Expression = {
val constantsMap = AttributeMap(equalityPredicates.map(_._1))
val predicates = equalityPredicates.map(_._2).toSet
def _replaceConstants(expression: Expression) = expression transform {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: replaceConstants0 is more java style

val predicates = equalityPredicates.map(_._2).toSet
def _replaceConstants(expression: Expression) = expression transform {
case a: AttributeReference =>
constantsMap.get(a) match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constentsMap.getOrElse(a, a)

@cloud-fan
Copy link
Contributor

LGTM except 2 minor style comments

@SparkQA
Copy link

SparkQA commented Dec 7, 2017

Test build #84605 has finished for PR 19912 at commit 433587a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants