-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-33847][SQL] Simplify CaseWhen if elseValue is None #30852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #133057 has finished for PR 30852 at commit
|
...src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala
Outdated
Show resolved
Hide resolved
# Conflicts: # sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #133167 has finished for PR 30852 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #133163 has finished for PR 30852 at commit
|
case e @ CaseWhen(branches, elseOpt) | ||
if elseOpt.isEmpty && branches.forall(_._2.semanticEquals(Literal(null, e.dataType))) => | ||
Literal(null, e.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only handle elseOpt.isEmpty
, elseOpt.nonEmpty
has handled by SPARK-24893.
Test build #133229 has finished for PR 30852 at commit
|
retest this please |
@@ -525,6 +525,10 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { | |||
} else { | |||
e.copy(branches = branches.take(i).map(branch => (branch._1, elseValue))) | |||
} | |||
|
|||
case e @ CaseWhen(branches, elseOpt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: case e @ CaseWhen(branches, None) if ...
Kubernetes integration test starting |
can we also add new test in |
@@ -94,6 +94,7 @@ object ReplaceNullWithFalseInPredicate extends Rule[LogicalPlan] { | |||
replaceNullWithFalse(cond) -> replaceNullWithFalse(value) | |||
} | |||
val newElseValue = cw.elseValue.map(replaceNullWithFalse) | |||
.orElse(if (newBranches.forall(_._2 == FalseLiteral)) Some(FalseLiteral) else None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change assumes that other rules can optimize CASE WHEN with false in all branches, which violates the catalyst principle that rules should be orthogonal. Can we think of a better way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we do need the logic from 2 optimizer rules:
- null means false in predicates
- eliminate CASE WHEN if all branches return the same value
We can probably include logic 2 in this rule
...
if (newBranches.forall(_._2 == FalseLiteral) && cw.elseValue.isEmpty) {
FalseLiteral
} else {
val newElseValue = cw.elseValue.map(replaceNullWithFalse)
CaseWhen(newBranches, newElseValue)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This change assumes CaseWhen has optimized by PushFoldableIntoBranches
, SimplifyConditionals
and ConstantFolding
.
Kubernetes integration test status failure |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
Show resolved
Hide resolved
Test build #133246 has finished for PR 30852 at commit
|
retest this please. |
Test build #133255 has finished for PR 30852 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #133260 has finished for PR 30852 at commit
|
retest this please. |
testJoin(originalCond = condition, expectedCond = condition) | ||
testDelete(originalCond = condition, expectedCond = condition) | ||
testUpdate(originalCond = condition, expectedCond = condition) | ||
val expectedCond = If( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the test name. Now it's inability to replace null in non-boolean branches of If
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we change this test to preserve the original purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed PushFoldableIntoBranches
from this test suite, and add integration test to ReplaceNullWithFalseInPredicateEndToEndSuite
.
@@ -258,4 +258,22 @@ class PushFoldableIntoBranchesSuite | |||
EqualTo(CaseWhen(Seq((a, Literal(1)), (c, Literal(2))), None).cast(StringType), Literal("4")), | |||
CaseWhen(Seq((a, FalseLiteral), (c, FalseLiteral)), None)) | |||
} | |||
|
|||
test("SPARK-33847: Remove the CaseWhen if elseValue is empty and other outputs are null") { | |||
assertEquivalent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to make the test more readable
Seq(a, LessThan(Rand(1), Literal(0.5)).foreach { condition =>
assertEquivalent(EqualTo(CaseWhen(Seq((condition, ...
assertEquivalent(// with cast ...
}
} | ||
|
||
test("replace None of elseValue inside CaseWhen if all branches are null") { | ||
val allFalseBranches = Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allNullBranches
|
||
test("SPARK-33847: Remove the CaseWhen if elseValue is empty and other outputs are null") { | ||
assertEquivalent( | ||
CaseWhen((GreaterThan('a, 1), Literal.create(null, IntegerType)) :: Nil, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, Seq(GreaterThan('a, 1), GreaterThan(Rand(0), 1)).foreach...
Kubernetes integration test starting |
Kubernetes integration test status failure |
Literal.create(null, IntegerType)) | ||
} | ||
|
||
assertEquivalent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2 tests below are not related to the test name Remove the CaseWhen if elseValue is empty and other outputs are null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it
Test build #133267 has finished for PR 30852 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #133289 has finished for PR 30852 at commit
|
thanks, merging to master! |
Test build #133300 has finished for PR 30852 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, late LGTM. Thanks, @wangyum and @cloud-fan .
What changes were proposed in this pull request?
ReplaceNullWithFalseInPredicate
to replace None of elseValue insideCaseWhen
withFalseLiteral
if all branches areFalseLiteral
. The use case is:Before this pr:
After this pr:
SimplifyConditionals
if elseValue is None and all outputs are null.Why are the changes needed?
Improve query performance.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test.