-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent LogicalPlan subquery handling in TreeNode::apply and TreeNode::visit #9913
Consistent LogicalPlan subquery handling in TreeNode::apply and TreeNode::visit #9913
Conversation
f223ee8
to
ea53b77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you (again) @peter-toth -- this looks really nice as well
I think there are is some more documentation we could add, but overall the only concern I have about this PR is the lack of test changes (like it seems to change the semantics of the analyzer / correlated expressions but no tests are changed). So this perhaps suggests we have a lack of test coverage in this area 🤔
@@ -370,6 +373,221 @@ impl LogicalPlan { | |||
} | |||
} | |||
|
|||
pub fn transform_expressions<F: FnMut(Expr) -> Result<Transformed<Expr>>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add some comments explaining what this is doing (namely that it applying a function to effectively rewrite the expressions of the LogicalPlan node in place)? It is a very cool API to see actually, as it will hopefully save a bunch of copying
}) => expr | ||
.into_iter() | ||
.map_until_stop_and_collect(f)? | ||
.update_data(|expr| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍 -- that is quite a neat way to express this pattern and get the ownership sorted out 👍
@@ -1080,8 +1298,72 @@ impl LogicalPlan { | |||
} | |||
|
|||
impl LogicalPlan { | |||
pub fn apply_with_subqueries<F: FnMut(&Self) -> Result<TreeNodeRecursion>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here, can we please add docstrings?
visitor.f_up(self) | ||
} | ||
TreeNodeRecursion::Jump => { | ||
self.visit_subqueries(visitor)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this treat subqueries as though they were siblings of the node (like if TNR::Jump
is returned subquery children will still be visited?)
If so I think I would find that unexpected (I would expect that the subqueries are treated like additional children)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just moved the code here from LogicalPlan::visit()
and in this PR I didn't want to change how subqueries are handled. The new code I added in LogicalPlan::rewrite_with_subqueries()
are in sync with this behaviour.
But this is a good question and I think you are right and I too would treat them as additional children. But that would be an API behavior change so I would rather change that in a separate PR.
@@ -155,7 +155,7 @@ impl Analyzer { | |||
|
|||
/// Do necessary check and fail the invalid plan | |||
fn check_plan(plan: &LogicalPlan) -> Result<()> { | |||
plan.apply(&mut |plan: &LogicalPlan| { | |||
plan.apply_with_subqueries(&mut |plan: &LogicalPlan| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is interesting that this didn't change any plans / tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically I just moved the code of LogicalPlan:: apply()
(that overrided TreeNode:: apply()
) to LogicalPlan:: apply_with_subqueries()
and changed the LogicalPlan:: apply()
calls to LogicalPlan:: apply_with_subqueries()
. I updated the PR description to be clear.
I rephrased the PR description to be clear. This PR doesn't change the analyzer or anything that uses |
Let me update this PR with the latest |
ea53b77
to
20f3d65
Compare
@alamb, I've updated the PR. I will try to add tests for the |
…n-subquery-handling
I think it would be great to do this as a follow up PR on Monday, however I think the utility of unit tests on this code is somewhat limited. It will be tested end to end once I switch the optimizers to using the TreeNode API in #9954. I took the liberty of pushing a commit to this branch to fix clippy and plan to merge it in when CI passes |
inspect_expressions
to apply_expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @peter-toth . Love it
where | ||
F: FnMut(&Expr) -> Result<(), E>, | ||
{ | ||
pub fn apply_expressions<F: FnMut(&Expr) -> Result<TreeNodeRecursion>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an api change as it renames a public function. I couldn't find a way to get the error type to line up, so I think it is unavoidable.
I'll mark this PR as API change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I found another pattern here: https://github.com/apache/arrow-datafusion/blob/d201ec742b08fa2a0549fc27f0b39622d67391ab/datafusion/expr/src/utils.rs#L684
Maybe we can pull this part (add apply_expressions
into a separate PR)? I think it would be valuable in its own -- for example it could be used to void the expressions()
copy here:
It could also make this PR smaller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can do that. Let me file a new PR with this change tomorrow and then look into the infinite recursion issue of this PR (peter-toth#3). Until then I'm marking this PR as draft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for https://github.com/apache/arrow-datafusion/blob/27bdf3c8c976874aae2417cb2aa435448832ecc0/datafusion/optimizer/src/analyzer/subquery.rs#L236-L241 I plan to check all .expressions()
calls and see if it can be transformed to the better .apply_expressions()
once this PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #9984 as I think we don't need that
inspect_expr_pre()
. This PR together with #9984 will remove all usecases of it.
I merged #9984
In order to make this PR easier to merge (so it isn't an API change) I
- pushed 7c90e70 to restore
inspect_expressions
(by callingapply_expressions
) and deprecated it. - Removed the API change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to check all .expressions() calls and see if it can be transformed to the better .apply_expressions() once this PR is merged.
This would be good -- I think there are some more (though #9974 did get merged)
🤔 it appears this PR has some sort of infinite loop -- I will investigate |
Thanks @alamb . Maybe because now we treat subqueries as additional children? I can also look into it later today or tomorrow... |
This is now ready for review again. The infinite recursion was due to an extra accidental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @peter-toth -- this PR looks great to me.
I took the liberty of restring inspect_expressions
and marking it deprecated and merging up from main.
As I have several PRs backed up on this one, I would like to merge it asap so plan to do so first thing tomorrow morning unless anyone else would like time to review
where | ||
F: FnMut(&Expr) -> Result<(), E>, | ||
{ | ||
pub fn apply_expressions<F: FnMut(&Expr) -> Result<TreeNodeRecursion>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
where | ||
F: FnMut(&Expr) -> Result<(), E>, | ||
{ | ||
pub fn apply_expressions<F: FnMut(&Expr) -> Result<TreeNodeRecursion>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #9984 as I think we don't need that
inspect_expr_pre()
. This PR together with #9984 will remove all usecases of it.
I merged #9984
In order to make this PR easier to merge (so it isn't an API change) I
- pushed 7c90e70 to restore
inspect_expressions
(by callingapply_expressions
) and deprecated it. - Removed the API change
I am not sure what you are planning on working on next @peter-toth but one thing I noticed is that keeping track of all the I wonder if we should move all the code into a single module (maybe alongside the |
inspect_expressions
to apply_expressions
synthetic_plan.apply(op)?; | ||
} | ||
_ => {} | ||
pub fn visit_with_subqueries<V: TreeNodeVisitor<Node = Self>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a follow on PR I also plan to add documentation to these functions. Perhaps I can do so as part of moving the code into its own module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs: #9996
Sure, thanks for doing that!
I was thinking about this a bit more and I unfortunately think this PR is still an API change. If some external code calls
I agree. I'm happy to take that task, but I will be travelling from Tuesday till the end of the week, so most likely I can only do that once I'm back. |
That makes sense -- I have added the label back.
I think it is inevitable that there are different calls for
Awesome -- thank you. I filed #9994 to track the idea. I may make a few small PRs in this direction to start the process too |
Thanks again @peter-toth |
@@ -1079,57 +1317,178 @@ impl LogicalPlan { | |||
} | |||
} | |||
|
|||
/// This macro is used to determine continuation during combined transforming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be dead code -- can somone check #9997 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I deliberately added ..._with_subqueries()
versions of all base TreeNode
APIs to LogicalPlan
. These 3 macros are used in LogicalPlan::rewrite_with_subqueries()
, LogicalPlan::transform_down_with_subqueries()
and LogicalPlan::transform_up_with_subqueries()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool -- thanks -- I think that will be clearer as we consolidate the tree rewriting code
Thanks @alamb for the review! |
Which issue does this PR close?
Part of #8913.
Rationale for this change
TreeNode
APIs are inconsistent onLogicaPlan
becauseLogicalPlan:apply()
andLogicalPlan::visit()
are recursing into subquery plans too (these methods override the originalTreeNode::apply
andTreeNode::visit()
methods).But there is no
LogicalPlan:rewite()
orLogicalPlan::transform()
overrides defined so callingrewrite()
andtransform()
on aLogicalPlan
does rewrite/transform only the current level of the plan and doesn't touch subquery plans.What changes are included in this PR?
This PR:
LogicalPlan::apply()
toLogicalPlan::apply_with_subqueries()
, andLogicalPlan::visit()
toLogicalPlan::visit_with_subqueries()
to separate functionality from original theTreeNode::apply()
andTreeNode::visit()
. So from now the new explicitLogicalPlan::..._with_subqueries()
APIs do traverse into subquery plans, while the originalTreeNode
APIs don't.LogicalPlan::rewrite_with_subqueries()
,LogicalPlan::transform_with_subqueries()
,LogicalPlan::transform_down_with_subqueries()
,LogicalPlan::transform_down_mut_with_subqueries()
,LogicalPlan::transform_up_with_subqueries()
,LogicalPlan::transform_up_mut_with_subqueries()
andLogicalPlan::transform_down_up_with_subqueries()
to be able to rewriteLogicalPlan
s with their subquery plans.LogicalPlan::..._with_subqueries()
s APIs the subquery plans behave like the first children of a node (siblings of the real children of the node).Are these changes tested?
Yes, with Existing UTs.
Are there any user-facing changes?
Yes -
LogicalPlan::apply()
no longer recurses into subqueries. The API user needs to change the code toLogicalPlan::apply_with_subqueries()
to get the old behaviour.