Skip to content

Conversation

@liancheng
Copy link
Contributor

This PR is a simplified version of several filter optimization rules introduced in #3778 authored by @scwf. Newly introduced optimizations include:

  1. a && a => a
  2. a || a => a
  3. (a || b || c || ...) && (a || b || d || ...) => a && b && (c || d || ...)

The 3rd rule is particularly useful for optimizing the following query, which is planned into a cartesian product

SELECT *
  FROM t1, t2
 WHERE (t1.key = t2.key AND t1.value > 10)
    OR (t1.key = t2.key AND t2.value < 20)

to the following one, which is planned into an equi-join:

SELECT *
  FROM t1, t2
 WHERE t1.key = t2.key
   AND (t1.value > 10 OR t2.value < 20)

The example above is quite artificial, but common predicates are likely to appear in real life complex queries (like the one mentioned in #3778).

A difference between this PR and #3778 is that these optimizations are not limited to Filter, but are generalized to all logical plan nodes. Thanks to @scwf for bringing up these optimizations, and @chenghao-intel for the generalization suggestion.

Review on Reviewable

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24762 has started for PR 3784 at commit cf95639.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24762 has finished for PR 3784 at commit cf95639.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24762/
Test FAILed.

@liancheng liancheng changed the title [SPARK-4937] Normalizes conjunctions and disjunctions to eliminate common predicates [SPARK-4937][SQL] Normalizes conjunctions and disjunctions to eliminate common predicates Dec 24, 2014
@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24763 has started for PR 3784 at commit 2abbf8e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24764 has started for PR 3784 at commit 5d54349.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24763 has finished for PR 3784 at commit 2abbf8e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24763/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24764 has finished for PR 3784 at commit 5d54349.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24764/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24776 has started for PR 3784 at commit 3cf7937.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24777 has started for PR 3784 at commit 0e51101.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24779 has started for PR 3784 at commit 4ab3a58.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24776 has finished for PR 3784 at commit 3cf7937.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24776/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24777 has finished for PR 3784 at commit 0e51101.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24777/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 24, 2014

Test build #24779 has finished for PR 3784 at commit 4ab3a58.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24779/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f.transformExpressionUp instead? And this rule also can be applied to the normal expressions, not just only for expression in Filter. Move this into BooleanSimplification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, f.transformExpressionUp is really a good idea! Moving to BooleanSimplification also makes sense. Thanks for the suggestions :)

@chenghao-intel
Copy link
Contributor

This is really helpful optimization, and the implementation makes sense to me, and it would be great if we apply the rules for the those non Filter expressions also.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24815 has started for PR 3784 at commit 00aaa63.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24816 has started for PR 3784 at commit f4d9b8f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24817 has started for PR 3784 at commit 601d5f6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24815 has finished for PR 3784 at commit 00aaa63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24815/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24816 has finished for PR 3784 at commit f4d9b8f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24816/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24817 has finished for PR 3784 at commit 601d5f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24817/
Test PASSed.

@scwf
Copy link
Contributor

scwf commented Dec 25, 2014

Hi @liancheng, i admit my PR is more complicated, but this only cover three cases, i think we'd better adding a separate rule to optimize And/Or in sql for as many as possible cases, not mix several cases in BooleanSimplification. So I am refactorying my PR to make it more readable and clean.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24821 has started for PR 3784 at commit caca560.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 25, 2014

Test build #24821 has finished for PR 3784 at commit caca560.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24821/
Test PASSed.

@scwf
Copy link
Contributor

scwf commented Dec 26, 2014

Hi, @liancheng, my PR originally also not limited to Filter, i used transformExpressionsDown from my first version, the tittle of my first version is not accurate:)

@liancheng
Copy link
Contributor Author

Hey @scwf, I've posted my reply in #3778, so lets discuss these rules there to prevent distraction.

@marmbrus
Copy link
Contributor

Thanks! I've merged this to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants