-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13811][SPARK-13836] [SQL] Removed IsNotNull Constraints of Compound Expressions And Generated IsNotNull Constraints inside Not #11649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gatorsmile this change makes sense to me, however what are your thoughts on slightly widening the scope of this change and not generating |
|
Test build #52907 has finished for PR 11649 at commit
|
|
@sameeragarwal True, we can add it back when needing it. Let me directly filter it out in |
|
I guess my question is why we are doing this for filters at all. It makes sense for joins, but perhaps we should just eliminate that part of the rule. |
|
@marmbrus , are you saying we should not add null filters for In addition, we will still remove the |
added a support for operator Not for all the binary operators.
|
In this commit, three parts are done:
|
|
I see. We should probably comment on that in If we are going to make this general to BinaryComparisions then we should probably also comment there that subclasses are assumed to be null intolerant. |
|
Also, does codegen really need us to add redundant expressions into the query plan? Can't it just look at the constraints and pick those that are useful for it? |
|
Will add the comments to explain the null intolerant of the For avoiding the merge conflicts, will not touch the code in |
|
Test build #52954 has finished for PR 11649 at commit
|
|
@marmbrus the reason for adding these inferred filters in the logical plan is because we currently only propagate constraints in the logical plan (and they are not directly accessible during code generation). In theory, we can propagate constraints in the physical plan instead but that'd be slightly unclean (since we'd then need to add propagation logic for all individual physical operators). @nongli, @yhuai and I briefly discussed the pros and cons of the 2 approaches yesterday and then decided to go with the current approach. PS: Added comments in #11665. |
|
Test build #52958 has finished for PR 11649 at commit
|
|
I guess I would probably propagate them during planning (i.e. when you are creating the physical The nice thing about this is then other operators that don't benefit from them (think |
|
I like the idea of @marmbrus Obviously, our outer join elimination is using null filtering constraints of compound expressions. I think this PR has to wait until the other related PRs are merged. |
|
I do see the benefits of propagating constraints in the physical plan but if I understand correctly, wouldn't we have problems with joins? For e.g., if a join has a condition on ( |
|
As mentioned in another thread, we do not explicitly add the inferred predicates(including I am not sure what is the best way to implement it. I guess we need such a class parameter for storing constraints in each case class? |
|
The benefit of adding those extra filters in the optimizer is that the predicate pushdown will just work. |
|
I see, the join case makes sense to me. I like this solution. |
| IsNotNull(resolveColumn(tr, "b"))))) | ||
| } | ||
|
|
||
| test("IsNotNull constraints of compound expressions in filters") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "no IsNotNull constraints are generated for compound expressions in filters" (here and in the test below)
|
LGTM |
What changes were proposed in this pull request?
We generate
IsNotNullconstraint for compound expressions. However, we should not add extra predicate conditions for these compound expressions. It will add extra costs for query processing. Normally, it will not filter out any row but requires extra processing.For example,
Without this PR, the plan will be like
After the PR, constraints-generated null filtering of compound expressions will not be added.
The solution is to remove the
IsNotNullconstraints for compound expressions.In addition, this PR can generate
IsNotNullconstraints for all theBinaryComparisoninsideNotexceptEqualNullSafe.cc @sameeragarwal @nongli @marmbrus
How was this patch tested?
Added a test case for Filter and another case for Join