Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

We generate IsNotNull constraint for compound expressions. However, we should not add extra predicate conditions for these compound expressions. It will add extra costs for query processing. Normally, it will not filter out any row but requires extra processing.

For example,

testRelation.where('a + 'b === 1)

Without this PR, the plan will be like

Filter (((a#0 + b#0) = 1) && isnotnull((a#0 + b#0)))
 +- LocalRelation [a#0,b#0,c#0]

After the PR, constraints-generated null filtering of compound expressions will not be added.

Filter ((a#0 + b#0) = 1))
 +- LocalRelation [a#0,b#0,c#0]

The solution is to remove the IsNotNull constraints for compound expressions.

In addition, this PR can generate IsNotNull constraints for all the BinaryComparison inside Not except EqualNullSafe.

cc @sameeragarwal @nongli @marmbrus

How was this patch tested?

Added a test case for Filter and another case for Join

@sameeragarwal
Copy link
Member

@gatorsmile this change makes sense to me, however what are your thoughts on slightly widening the scope of this change and not generating IsNotNull constraints for any compound expression (in constructIsNotNullConstraints)? As you pointed out, I can't think of an optimization that can make use of these non-nullability constraints for compound expressions. Later, we can always extend constructIsNotNullConstraints to infer isNotNull constraints for individual attributes from compound expressions.

@SparkQA
Copy link

SparkQA commented Mar 11, 2016

Test build #52907 has finished for PR 11649 at commit 9cba8a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

@sameeragarwal True, we can add it back when needing it. Let me directly filter it out in constructIsNotNullConstraints

@marmbrus
Copy link
Contributor

I guess my question is why we are doing this for filters at all. It makes sense for joins, but perhaps we should just eliminate that part of the rule.

@gatorsmile
Copy link
Member Author

@marmbrus , are you saying we should not add null filters for Filter? #11372 The original PR shows it can benefit some data sources like orc ? I am not very clear if this is the major motivation. cc @sameeragarwal

In addition, we will still remove the IsNotNull filters for compound expressions from the constructIsNotNullConstraints anyway? Is my understanding right?

added a support for operator Not for all the binary operators.
@gatorsmile
Copy link
Member Author

In this commit, three parts are done:

  • removed the IsNotNull constraints for compound expressions
  • added IsNotNull constraints support for all the BinaryComparison inside Not except EqualNullSafe.
  • added two more test cases.

@gatorsmile gatorsmile changed the title [SPARK-13811] [SQL] No Push-Down of Constraints-generated Null Filtering of Compound Expressions [SPARK-13811] [SQL] Removed IsNotNull Constraints of Compound Expressions And Generated IsNotNull Constraints inside Not Mar 11, 2016
@sameeragarwal
Copy link
Member

@marmbrus by adding additional isNotNull checks, we are hoping to generate code that can quickly short circuit the filter condition checks in the generated code: #11585

@gatorsmile gatorsmile changed the title [SPARK-13811] [SQL] Removed IsNotNull Constraints of Compound Expressions And Generated IsNotNull Constraints inside Not [SPARK-13811][SPARK-13836] [SQL] Removed IsNotNull Constraints of Compound Expressions And Generated IsNotNull Constraints inside Not Mar 11, 2016
@marmbrus
Copy link
Contributor

I see. We should probably comment on that in NullFiltering if that is the intention. Anytime there are implicit contracts between components like this it makes things harder to reason about.

If we are going to make this general to BinaryComparisions then we should probably also comment there that subclasses are assumed to be null intolerant.

@marmbrus
Copy link
Contributor

Also, does codegen really need us to add redundant expressions into the query plan? Can't it just look at the constraints and pick those that are useful for it?

@gatorsmile
Copy link
Member Author

Will add the comments to explain the null intolerant of the BinaryComparisions subclasses.

For avoiding the merge conflicts, will not touch the code in NullFiltering, since @sameeragarwal will rewrite the whole function for inferred predicate push down. @sameeragarwal Could you add a comment there? Thanks!

@SparkQA
Copy link

SparkQA commented Mar 11, 2016

Test build #52954 has finished for PR 11649 at commit baa2cda.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sameeragarwal
Copy link
Member

@marmbrus the reason for adding these inferred filters in the logical plan is because we currently only propagate constraints in the logical plan (and they are not directly accessible during code generation).

In theory, we can propagate constraints in the physical plan instead but that'd be slightly unclean (since we'd then need to add propagation logic for all individual physical operators). @nongli, @yhuai and I briefly discussed the pros and cons of the 2 approaches yesterday and then decided to go with the current approach.

PS: Added comments in #11665.

@SparkQA
Copy link

SparkQA commented Mar 12, 2016

Test build #52958 has finished for PR 11649 at commit 55155a7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

I guess I would probably propagate them during planning (i.e. when you are creating the physical Filter operator) since its not really an optimization to add them to a logical plan, but instead its a physical concern that things execute more efficiently for that one specific operator if they are present.

The nice thing about this is then other operators that don't benefit from them (think PhysicalOperation) don't necessary need to see them.

@gatorsmile
Copy link
Member Author

I like the idea of @marmbrus

Obviously, our outer join elimination is using null filtering constraints of compound expressions. I think this PR has to wait until the other related PRs are merged.

@sameeragarwal
Copy link
Member

I do see the benefits of propagating constraints in the physical plan but if I understand correctly, wouldn't we have problems with joins?

For e.g., if a join has a condition on (t1.a === t2.a && t1.a == 5), just inferring the constraints/filters during physical planning wouldn't give us the liberty of pushing the additional isNotNull(t1.a), isNotNull(t2.a) and t2.a === 5 filters all the way to the leaf nodes on either side of the join in the query plan (there'd be no PredicatePushdown rules).

@gatorsmile
Copy link
Member Author

As mentioned in another thread, we do not explicitly add the inferred predicates(including isNotNull) into the conditions always. We just store them in the nodes. So far, the current Constraint framework only does a bottom-up constraint propagation. That is not enough. We need a way to push down the local inferred predicates as low as possible.

I am not sure what is the best way to implement it. I guess we need such a class parameter for storing constraints in each case class?

@yhuai
Copy link
Contributor

yhuai commented Mar 14, 2016

The benefit of adding those extra filters in the optimizer is that the predicate pushdown will just work.

@marmbrus
Copy link
Contributor

I see, the join case makes sense to me. I like this solution.

IsNotNull(resolveColumn(tr, "b")))))
}

test("IsNotNull constraints of compound expressions in filters") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "no IsNotNull constraints are generated for compound expressions in filters" (here and in the test below)

@sameeragarwal
Copy link
Member

LGTM

@gatorsmile gatorsmile closed this Apr 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants