-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop copying LogicalPlan and Exprs in EliminateCrossJoin
(4% faster planning)
#10431
Conversation
6ec02a4
to
1fcc6d1
Compare
@@ -237,7 +324,7 @@ fn find_inner_join( | |||
)?); | |||
|
|||
return Ok(LogicalPlan::Join(Join { | |||
left: Arc::new(left_input.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is one example where the plans were cloned
} | ||
} | ||
_ => all_inputs.push(child.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is an example of the clones that are removed
@@ -259,7 +346,7 @@ fn find_inner_join( | |||
)?); | |||
|
|||
Ok(LogicalPlan::CrossJoin(CrossJoin { | |||
left: Arc::new(left_input.clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise here is another removed clone
}; | ||
|
||
// If there are no join keys then do nothing: | ||
if all_join_keys.is_empty() { | ||
Filter::try_new(predicate.clone(), Arc::new(left)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also removed several expr clones
EliminateCrossJoin
EliminateCrossJoin
(4% faster planning)
1fcc6d1
to
41be324
Compare
41be324
to
05d0946
Compare
} | ||
} | ||
|
||
let can_flatten_inputs = can_flatten_join_inputs(&plan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this took a fair amount of finagling to split up the cases where the rewrite will happen and thus we should destructure the Filter and when not. I believe the logic is all the same, but the code needed to be reorganized
/// 'select ... from a, b, c where (a.x = b.y and b.xx = 100 and a.z = c.z) | ||
/// or (a.x = b.y and b.xx = 200 and a.z=c.z);' | ||
/// 'select ... from a, b where a.x > b.y' | ||
/// Eliminate cross joins by rewriting them to inner joins when possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I was in here and had it all in my head, I updated the documentation to explain how things worked
} | ||
|
||
if !can_flatten_join_inputs(&filter.input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this condition can go earlier to save some cpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the call to rewrite_children
needs to happen to apply the rule recursively before proceeding
I am not quite sure what else we could save.
I tried to make this clearer in 030444d with comments and moving another definition of can_flatten_join_inputs
closer to where it was used.
// The filter of inner join will lost, skip this rule. | ||
// issue: https://github.com/apache/datafusion/issues/4844 | ||
return Ok(false); | ||
return internal_err!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Filter::try_new(predicate.clone(), Arc::new(left)) | ||
.map(|f| Some(LogicalPlan::Filter(f))) | ||
Filter::try_new(predicate, Arc::new(left)) | ||
.map(LogicalPlan::Filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do it in 1 map iteration ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in cb8c1f1
Some(filter_expr) => Filter::try_new(filter_expr, Arc::new(left)) | ||
.map(|f| Some(LogicalPlan::Filter(f))), | ||
_ => Ok(Some(left)), | ||
.map(LogicalPlan::Filter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have 1 map iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in cb8c1f1
/// | ||
/// After the rule is applied, the plan will look like this: | ||
/// ```text | ||
/// Filter(b.xx = 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope the pushdown filter predicate rule comes after this rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes absolutely
I double checked and PushDownFilter
does indeed run after this pass:
datafusion/datafusion/optimizer/src/optimizer.rs
Lines 248 to 259 in 9cc981b
Arc::new(EliminateCrossJoin::new()), | |
Arc::new(CommonSubexprEliminate::new()), | |
Arc::new(EliminateLimit::new()), | |
Arc::new(PropagateEmptyRelation::new()), | |
// Must be after PropagateEmptyRelation | |
Arc::new(EliminateOneUnion::new()), | |
Arc::new(FilterNullJoinKeys::default()), | |
Arc::new(EliminateOuterJoin::new()), | |
// Filters can't be pushed down past Limits, we should do PushDownFilter after PushDownLimit | |
Arc::new(PushDownLimit::new()), | |
Arc::new(PushDownFilter::new()), | |
Arc::new(SingleDistinctToGroupBy::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alamb I'd say it is great
Thank you for the review @comphead -- I think I addressed all your comments |
… planning) (apache#10431) * Stop copying LogicalPlan and Exprs in `EliminateCrossJoin` * Clarify when can_flatten_join_inputs runs * Use a single `map`
Draft as it builds onLogicalPlan::recompute_schema
for handling rewrite passes #10410EliminateCrossJoin
better #10427EliminateCrossJoin
(3%-5% faster planning) #10430Which issue does this PR close?
Closes #10287
Rationale for this change
Make planning faster by avoiding copying
What changes are included in this PR?
EliminateCrossJoin
to use TreeNode API and stop copying plansExpr
copies moreAre these changes tested?
Existing CI
Are there any user-facing changes?
No functional changes,
Performance change: 4% better planning
(note this includes the improvements from #10427 as well)
Details