Skip to content

[SPARK-32801][SQL] Make InferFiltersFromConstraints take into account EqualNullSafe #29650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

tanelk
Copy link
Contributor

@tanelk tanelk commented Sep 4, 2020

What changes were proposed in this pull request?

InferFiltersFromConstraints only infers new filters using EqualTo, generalized it to also include EqualNullSafe.

This introduced an infinite loop in InferFiltersFromConstraintsSuite. It has been fixed in the Optimizer by restructuring the batches (#19149) - I did the same in the test suite.

Also it revealed, that the InferFiltersFromConstraints is not idempotent, for two cases:

  • A chain of filters like "x.a".attr === "x.b".attr && "x.a".attr === "y.a".attr && "x.b".attr === "y.b".attr, that should infer "y.a".attr === "y.b".attr
  • Constraints form the right side of left semi join are ignored when it is a child of a another join

These were also fixed

Why are the changes needed?

Possible performance improvement from new inferred constraints

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@tanelk tanelk changed the title [SPARK-32801][SQL] Make InferFiltersFromConstraints take in account EqualNullSafe [SPARK-32801][SQL] Make InferFiltersFromConstraints take into account EqualNullSafe Sep 4, 2020
@tanelk tanelk changed the title [SPARK-32801][SQL] Make InferFiltersFromConstraints take into account EqualNullSafe [WIP][SPARK-32801][SQL] Make InferFiltersFromConstraints take into account EqualNullSafe Sep 4, 2020
@tanelk tanelk changed the title [WIP][SPARK-32801][SQL] Make InferFiltersFromConstraints take into account EqualNullSafe [SPARK-32801][SQL] Make InferFiltersFromConstraints take into account EqualNullSafe Sep 5, 2020
Copy link
Member

@wangyum wangyum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure we can infer from EqualNullSafe? Some EqualNullSafe from Alias:

case a @ Alias(l: Literal, _) =>
allConstraints += EqualNullSafe(a.toAttribute, l)
case a @ Alias(e, _) =>
// For every alias in `projectList`, replace the reference in constraints by its attribute.
allConstraints ++= allConstraints.map(_ transform {
case expr: Expression if expr.semanticEquals(e) =>
a.toAttribute
})
allConstraints += EqualNullSafe(e, a.toAttribute)

@tanelk
Copy link
Contributor Author

tanelk commented Sep 7, 2020

Are you sure we can infer from EqualNullSafe? Some EqualNullSafe from Alias:

I do not follow, why should this make EqualNullSafe not work?

@SparkQA
Copy link

SparkQA commented Dec 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37659/

@SparkQA
Copy link

SparkQA commented Dec 19, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37659/

@tanelk
Copy link
Contributor Author

tanelk commented Dec 19, 2020

@cloud-fan, it seems you worked on this
also cc @maropu

@SparkQA
Copy link

SparkQA commented Dec 19, 2020

Test build #133059 has finished for PR 29650 at commit 6d56655.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Test build #134102 has finished for PR 29650 at commit 6d56655.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

cc @maryannxue too

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Apr 28, 2021
@github-actions github-actions bot closed this Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants