-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TreeNode Refactor Part 2 #8653
TreeNode Refactor Part 2 #8653
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this carefully and it looks great to me. Now all the recursive structures are crystal clear and it should inform the overall TreeNode
refactor/redesign effort nicely.
I hope to review this PR later today but may not get to it for a day or two |
For ease of review, I suggest looking at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spot checked this code and verified that all existing tests pass. Thus while I don't fully understand the nuances involved, from my point of view it is a nice step forward
Thank you for the contribution @berkaysynnada and @ozankabak (and @peter-toth who seems to have spurred the current set of improvements)
I can't wait to see what things look like after a few more rounds of improvement ❤️
FYI @jackwener
I checked |
Seems like everybody is on board and follow-up work is already getting underway, so I will go ahead and merge this in a few hours |
* Refactor TreeNode's * Update utils.rs * Final review * Remove unnecessary clones, more idiomatic Rust --------- Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
Which issue does this PR close?
The continuation of the work initiated in #8395. It could also be beneficial for #7942.
Rationale for this change
TreeNode
implementations of some optimizer rules are challenging to understand and are open to misuse. This refactor standardizes the implementations and eliminates unnecessary payloads.What changes are included in this PR?
These implementations have been refactored:
map_children()
functions of these implementations are now uniform. Previously, some of the rules were inmap_children()
, others were in some utils such asnew_from_children()
, and some were in transformer rules. This distributed structure made understanding and maintenance difficult. All these rules have now been moved into functions used as transform arguments on the optimizer part.Since Datafusion trees generally consist of nodes that store their children, each transform can implicitly have bottom-up transform capability. In some uses of
transform_up()
, after updating the children of the self node, additional logic is added during their attachment to the self node. This practice has been avoided. Now, allmap_children()
does is attach the updated children to the default-created self node without any modification of the self node.A similar situation may cause an algorithm that is expected to perform
transform_down()
to perform an implicittransform_up()
if themap_children()
implementation of the rule that performtransform_down()
includes some logic. Perhaps a more comprehensive tree visitor-transformer design can be planned to address this issue.Are these changes tested?
Yes, with existing tests.
Are there any user-facing changes?