-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to iterative DynNode
and ConcreteTreeNode
processing
#13177
Switch to iterative DynNode
and ConcreteTreeNode
processing
#13177
Conversation
DynNode
visit and rewriting DynNode
visiting and rewriting
DynNode
visiting and rewriting DynNode
visiting and rewriting
DynNode
visiting and rewriting DynNode
visiting and rewriting
b6fa0a7
to
e88f47f
Compare
e88f47f
to
83d194e
Compare
e6600ea
to
e475464
Compare
@peter-toth Hey! I think you've worked on quite a few methods modified by this PR, in case you'd like to take a look 😀 |
Hi @blaginin, thanks for pinging me. I like the iterative approach, but I'm a bit concerned that this API can be much slower than the recursive one. Do you think you could run some benchmarks? |
Thank you @blaginin. First, I want to express that I share @peter-toth’s concern. Most of the rules are written with the assumption that the traversal is I also recommend making the test changes in a follow-up PR to facilitate easier review. |
I agree with both suggestions (benchmarks to see the impact and avoiding test changes in the first PR) |
Hmm, do we have rules that expect BFS traversal? I think the current APIs only support pre-order/post-order (DFS like) traversals. Also, I don't see yet how this will work on tree nodes that own their children (e.g. |
I did a typo, I meant DFS (: |
I agree with you. That clones could be huge. @blaginin do you have time to implement the similar version for |
With |
I will run the planning benchmarks on this PR |
- reuse vec instead of making a new one - use mut ref to the last element instead of taking it on every iteration
Here is what I got:
Possibly a tiny slowdown (like 1%) but also hard to tell given it is so small |
It would possibly be more significant when we apply it on concrete types and Expr's. |
DynNode
visiting and rewriting DynNode
and ConcreteTreeNode
processing
Thanks for the comments! 🙏 As for the performance, I removed a few extra As for the tests, I'm not sure I got the idea. This PR reimplements some methods in the interfaces, while the existing tests only call the main implementation. If I remove the test changes, then the current PR won't be tested, which I am not sure is desirable?
I think this is a very valid point, @peter-toth!
Apart from the obvious implementation solutions, I was thinking about capturing unpacked objects in closures, but maybe we can think of something better? This is definitely a very big piece of work, with a lot of discussion needed and potentially quite a few breaking changes... However, I am not sure we should do this discussion as part of this PR. My idea here was to fix the cases that can easily be fixed. So now, even without breaking changes, we can fix quite a few of the existing structs and future ones that will implement the supported interface. And even if we decide to introduce tools like |
Thank you @blaginin for improving the PR and pointing me to those conversations. Unfortunately I missed those and wasn't aware that there have been an attempt to add stack growth libraries. I followed the commits yesterday and saw how each of them made the implementation better and better. I could only suggest minor changes like But, on the other hand, I also looked into Honestly, I don't share the concerns about these libraries. By running some benchmarks I was referring to running the old vs new APIs on large trees with many nodes. IMO the planning benchmarks tell a lot about everyday queries, but doesn't contain any extreme cases with very large trees. I forked your branch yesterday: https://github.com/peter-toth/datafusion/commits/recursive-vs-iterative/ (not the latest state, but after the visit improvement) and in 3 commits separated the old (recursive) and new (iterative)
It looks like the 2 implementations are comparable. Maybe the recursive approach + But overall, I feel I'm still leaning tovards
|
That's true! I also checked your benchmark on different tree heights Regarding your reasoning on |
Are you refering to visiting / transforming a node's children in parallel on multiple threads? Unfortunately, I'm not sure either if this idea was discussed earlier. I remember I was thinking about it when we consolidated the tree traversal semantics and formed the current APIs. But as far as I can recall I didn't find that many |
What are our plans for this PR? Do we all agree to apply these changes? If so, I would like to do a final detailed review. |
Well, I'm still on the side of keeping the recursive way and use |
Hey @berkaysynnada 👋 Yes, Peter is right - I think we should decide on the approach before starting the actual review. I tagged Andrew above, because he was the one to propose recursive approach. When he has time to review, I'll either work on this PR or start a new one with |
I'll try and take a closer look later today |
I am very worried about using
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thank you very much @blaginin and @berkaysynnada and @peter-toth -- this is a fairly epic PR
My feeling is that after reading this PR is that there is a very large amount of complexity that arises from ExecutionPlans that don't have ownership of their children (aka they are Arc<dyn ExecutionPlan
> rather than Box<dyn ExecutionPlan
>
If we could actually modify the ExecutionPlan
s (take children, add new children) I think this code would likely be much simpler as well as more performant. We could also potentially offer in-place mutation APIs that would be even faster.
What do people think about trying to push through a change to avoid Arc<dyn ExecutionPlan
> ?
Has anyone verified that this change fixes the issue described on ?
datafusion/common/src/tree_node.rs
Outdated
|
||
/// Detaches children from the parent node (if possible). | ||
/// Unlike [`ConcreteTreeNode`] it doesn't possible that the value will actually be removed from the parent | ||
fn take_children(self) -> (Self, Vec<Self>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these methods be moved into TreeNode
itself? It feels like it just makes things additionally complicated to try and implement a separate set of APIs that are related but not quite the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed RecursiveNode
in 403230a by moving its logic into ConcreteTreeNode (though I’m not sure if this could be done directly in TreeNode).
I also linked DynTreeNode
and ConcreteTreeNode
. This change might be debatable, so I’m happy to discuss further if needed :)
This PR doesn't resolve that specific issue with Logical Plans, although it does fix a problem with processing large Physical Plans. I think more work is needed to fix the original issue (including the child ownership you mentioned), but I think this PR is already quite big already |
The fact that Rust compiler uses
I think that's a bit different topic, both the current recursive and proposed iterative approaches could benefit if taking a node's children and reattaching them would be possible. I don't think this is the main differentiator between the 2. I think what we need to decide is if we can rely on
|
Actually, this is a good question. I've just run this PR locally on the attached example and I got stack overflow. The root cause is not the
As far as I see the issue was reported with DataFusion 36.0.0 and this problematic recursive |
Hey, I can rewrite |
I did not realize it was used by the rust compiler. I just did some research and verified it is indeed used that way: https://github.com/search?q=repo%3Arust-lang/rust%20ensure_sufficient_stack&type=code Based on htat new information I think stacker seems like a reasonable way to go. |
Sorry for the runaround @blaginin -- we appreciate all the work you have done on this issue |
It was a really cool discussion here, and I really appreciate all the reviews! Very happy the issue will be solved in a good way!! 😀 |
Part of #9373
Rationale for this change
Currently, on large trees, iterative visiting and rewriting can cause stack overflow because node-processing algorithms are recursive.
What changes are included in this PR?
This PR implements iterative DFS-like tree processing for
DynTreeNode
. In the next PR, I will also updateConcreteTreeNode
to use these functions.Are these changes tested?
Yes, I've added a new test case to the existing tests and included a large tree test that fails with the recursion but passes with the iterative approach.
The diff for tests looks a bit messy, but it really just moving repetitive parts into a macro.
Are there any user-facing changes?
Yes, a minor update to
with_new_arc_children
.