Skip to content

Conversation

@Jesse-Bakker
Copy link
Contributor

Which issue does this PR close?

Part of #3725

Rationale for this change

In some cases, it is possible to prove that a Filter only ever produces one
row. In those cases, such a filter may be used in a scalar subquery to ensure
it is in fact scalar, allowing for more flexible use of scalar subqueries.

What changes are included in this PR?

This adds an is_scalar() method to Filter, which will check if there is a
unique functional dependence which is covered by the Filter's predicate.
This is used in LogicalPlan::max_rows() to provide a tighter bound on the
maximum number of rows returned in the presence of Filters.

Are these changes tested?

This is directly tested with a unit test and with a sqllogictest that exercises
the more flexible use of scalar subqueries.

Are there any user-facing changes?

Scalar subqueries are more flexible. Previous constraints were not documented
as far as I can tell

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Nov 23, 2023
@Jesse-Bakker Jesse-Bakker force-pushed the scalar-filter branch 3 times, most recently from 2f4cb79 to 44383c2 Compare November 23, 2023 09:28
@Jesse-Bakker Jesse-Bakker changed the title Detect when filters make subqueries scalar Detect when filters on unique constraints make subqueries scalar Nov 23, 2023
@alamb
Copy link
Contributor

alamb commented Nov 27, 2023

Thank you @Jesse-Bakker -- I plan to review this carefully tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Jesse-Bakker -- this looks great. I had a few suggestions, mostly about comments, and code reuse. But otherwise I think this PR is pretty much ready

cc @liukun4515 and @jackwener who may have some more thoughts about subquery rewrites

@alamb
Copy link
Contributor

alamb commented Nov 28, 2023

The only thing I think is strictly needed to approve this PR is docstrings for "is_scalar" -- the rest is "nice to have" from my perspective / could be done as a follow on PR

@jackwener
Copy link
Member

jackwener commented Nov 30, 2023

unique constraints and function dependences is complex feature (Worthy of being carefully designed), I prepare to think more about it.

I will review this PR and think more about it. If this PR isn't urgent, please wait for me.

Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jesse-Bakker

/// `Filter(b = 2).is_scalar() == false`
/// and
/// `Filter(a = 2 OR b = 2).is_scalar() == false`
fn is_scalar(&self) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest use Uniform slot and Unique slot to describe FDs instead of scalar.

Commonly used Functional dependencies : including

uniform slot, which means ndv <= 1
unique slot, which means ndv = row

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the suggestion is to rename this method to Filter::is_uniform()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and add these definition into doc/

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Jesse-Bakker - let's see if @jackwener has a chance to clarify. If we haven't heard by tomorrow I'll merge this PR

Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jesse-Bakker @alamb

I have no other question.

@alamb
Copy link
Contributor

alamb commented Dec 6, 2023

I took the liberty of merging up from main to resolve a conflict in this PR

@alamb alamb merged commit c8e1c84 into apache:main Dec 6, 2023
@alamb
Copy link
Contributor

alamb commented Dec 6, 2023

Thanks again @Jesse-Bakker and @jackwener

appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Dec 15, 2023
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants