Skip to content

[SPARK-28220][SQL] Improve PropagateEmptyRelation to support join with false condition #31857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Mar 17, 2021

What changes were proposed in this pull request?

Improve PropagateEmptyRelation to support join with false condition. For example:

SELECT * FROM t1 LEFT JOIN t2 ON false

Before this pr:

== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- BroadcastNestedLoopJoin BuildRight, LeftOuter, false
   :- FileScan parquet default.t1[a#4L]
   +- BroadcastExchange IdentityBroadcastMode, [id=#40]
      +- FileScan parquet default.t2[b#5L]

After this pr:

== Physical Plan ==
*(1) Project [a#4L, null AS b#5L]
+- *(1) ColumnarToRow
   +- FileScan parquet default.t1[a#4L]

Why are the changes needed?

Avoid BroadcastNestedLoopJoin to improve query performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Mar 17, 2021
@wangyum
Copy link
Member Author

wangyum commented Mar 17, 2021

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Mar 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40832/

@SparkQA
Copy link

SparkQA commented Mar 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40832/

@SparkQA
Copy link

SparkQA commented Mar 19, 2021

Test build #136250 has finished for PR 31857 at commit 25689da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1468,6 +1469,12 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
case _ => false
}

private def pushDownJoinConditions(conditions: Seq[Expression], plan: LogicalPlan) = {
conditions
.filterNot(_.semanticEquals(TrueLiteral)) // Push down true condition is useless.
Copy link
Contributor

@cloud-fan cloud-fan Mar 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be optimized by BooleanSimplification already? true And cond -> cond

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is another issue. Will be fixed by another PR.

@wangyum wangyum changed the title [SPARK-28220][SQL] Push down the foldable predicate to both sides of Join [SPARK-28220][SQL] Improve PropagateEmptyRelation to support join with false condition Mar 19, 2021
@SparkQA
Copy link

SparkQA commented Mar 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40841/

@SparkQA
Copy link

SparkQA commented Mar 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40841/

# Conflicts:
#	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelationSuite.scala
@SparkQA
Copy link

SparkQA commented Mar 19, 2021

Test build #136259 has finished for PR 31857 at commit debb094.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 19, 2021

Test build #136263 has finished for PR 31857 at commit 3202590.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40845/

@SparkQA
Copy link

SparkQA commented Mar 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40845/

@wangyum
Copy link
Member Author

wangyum commented Mar 19, 2021

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40854/

@SparkQA
Copy link

SparkQA commented Mar 20, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40854/

@SparkQA
Copy link

SparkQA commented Mar 20, 2021

Test build #136273 has finished for PR 31857 at commit 3202590.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum wangyum closed this in 908318f Mar 20, 2021
@wangyum
Copy link
Member Author

wangyum commented Mar 20, 2021

Merged to master.

@wangyum wangyum deleted the SPARK-28220 branch March 20, 2021 14:58
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, late LGTM.
Thank you, @wangyum and @cloud-fan .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants