Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine the two rules: DecorrelateWhereExists and DecorrelateWhereIn #6271

Merged
merged 7 commits into from
May 15, 2023

Conversation

mingmwang
Copy link
Contributor

@mingmwang mingmwang commented May 7, 2023

Which issue does this PR close?

Closes #6086

Rationale for this change

Combine the two rules into one unify rule DecorrelatePredicateSubquery

What changes are included in this PR?

  1. Combine the two rules.
  2. Addressed the TODO: if subquery doesn't get optimized, optimized children are lost.

Are these changes tested?

Yes, covered by existing UTs.

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules labels May 7, 2023
@mingmwang mingmwang requested review from jackwener and alamb May 7, 2023 11:13
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mingmwang -- this is looking awesome. I reviewed the plan changes carefully and spot checked the code.

Everything I saw looks great, but I think it would help if someone who was more familiar with this code (like @ygf11 or @jackwener , perhaps) could review it more carefully.

If they don't have time, I will try and find time to fully review this PR later this week

@@ -5,13 +5,14 @@
| | Projection: orders.o_orderpriority, COUNT(UInt8(1)) AS order_count |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed these plans and they see to be all the same, structurally (there are some aliases that are different)

@mingmwang
Copy link
Contributor Author

@jackwener
Could you please help to take a look ?

@jackwener
Copy link
Member

I am going to review this PR carefully in the next two days, Thanks @mingmwang

@@ -54,7 +59,7 @@ impl DecorrelateWhereIn {
predicate: &Expr,
config: &dyn OptimizerConfig,
) -> Result<(Vec<SubqueryInfo>, Vec<Expr>)> {
let filters = split_conjunction(predicate); // TODO: disjunctions
let filters = split_conjunction(predicate); // TODO: add ExistenceJoin to support disjunctions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR already support disjunction of scalar/in subquery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't. To support disjunction of predicate subqueries, need to add a specific join type ExistenceJoin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, BTW, there is a issue to track it MarkJoin #5368

Copy link
Member

@jackwener jackwener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A great job to me

@mingmwang
Copy link
Contributor Author

@jackwener @alamb

Can we merge this PR, so that I can continue working on the other sub query related improvements/enhancements.

@jackwener
Copy link
Member

jackwener commented May 15, 2023

@jackwener @alamb

Can we merge this PR, so that I can continue working on the other sub query related improvements/enhancements.

I review this PR an there is nothing wrong with this PR

I think we can push this PR to continue to do more work

@alamb alamb merged commit ca7f760 into apache:main May 15, 2023
@alamb
Copy link
Contributor

alamb commented May 15, 2023

THanks @jackwener and @mingmwang

@alamb alamb mentioned this pull request May 15, 2023
@alamb
Copy link
Contributor

alamb commented May 15, 2023

I believe this caused a minor diff on main causing Ci to fail: #6353

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Code refactor, combine the two rules: DecorrelateWhereExists and DecorrelateWhereIn
3 participants