Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeNode Refactor Part 2 #8653

Merged
merged 4 commits into from
Dec 27, 2023
Merged

TreeNode Refactor Part 2 #8653

merged 4 commits into from
Dec 27, 2023

Conversation

berkaysynnada
Copy link
Contributor

@berkaysynnada berkaysynnada commented Dec 25, 2023

Which issue does this PR close?

The continuation of the work initiated in #8395. It could also be beneficial for #7942.

Rationale for this change

TreeNode implementations of some optimizer rules are challenging to understand and are open to misuse. This refactor standardizes the implementations and eliminates unnecessary payloads.

What changes are included in this PR?

These implementations have been refactored:

  1. DistributionContext
  2. PlanWithCorrespondingSort
  3. PlanWithCorrespondingCoalescePartitions
  4. PipelineStatePropagator
  5. OrderPreservationContext
  6. SortPushDown
  7. ExprOrdering
  8. PlanWithKeyRequirements

map_children() functions of these implementations are now uniform. Previously, some of the rules were in map_children(), others were in some utils such as new_from_children(), and some were in transformer rules. This distributed structure made understanding and maintenance difficult. All these rules have now been moved into functions used as transform arguments on the optimizer part.

Since Datafusion trees generally consist of nodes that store their children, each transform can implicitly have bottom-up transform capability. In some uses of transform_up(), after updating the children of the self node, additional logic is added during their attachment to the self node. This practice has been avoided. Now, all map_children() does is attach the updated children to the default-created self node without any modification of the self node.

A similar situation may cause an algorithm that is expected to performtransform_down() to perform an implicit transform_up() if the map_children() implementation of the rule that perform transform_down() includes some logic. Perhaps a more comprehensive tree visitor-transformer design can be planned to address this issue.

Are these changes tested?

Yes, with existing tests.

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate labels Dec 25, 2023
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this carefully and it looks great to me. Now all the recursive structures are crystal clear and it should inform the overall TreeNode refactor/redesign effort nicely.

@alamb
Copy link
Contributor

alamb commented Dec 26, 2023

cc @peter-toth @sadboy @Dandandan @avantgardnerio

@alamb
Copy link
Contributor

alamb commented Dec 26, 2023

I hope to review this PR later today but may not get to it for a day or two

@ozankabak
Copy link
Contributor

For ease of review, I suggest looking at struct definitions of tree nodes and the transform functions. All map_children implementations are the same. Hopefully, this will help us design a framework where one doesn't need to define map_children and apply_children over and over again and just provide the transformation logic and a mechanism to access children nodes.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spot checked this code and verified that all existing tests pass. Thus while I don't fully understand the nuances involved, from my point of view it is a nice step forward

Thank you for the contribution @berkaysynnada and @ozankabak (and @peter-toth who seems to have spurred the current set of improvements)

I can't wait to see what things look like after a few more rounds of improvement ❤️

FYI @jackwener

@peter-toth
Copy link
Contributor

peter-toth commented Dec 27, 2023

I checked SortPushDown, ExprOrdering and PlanWithKeyRequirements when opened #8664 on the top of this PR.

@ozankabak
Copy link
Contributor

Seems like everybody is on board and follow-up work is already getting underway, so I will go ahead and merge this in a few hours

@ozankabak ozankabak merged commit 6403222 into apache:main Dec 27, 2023
23 checks passed
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Jan 4, 2024
* Refactor TreeNode's

* Update utils.rs

* Final review

* Remove unnecessary clones, more idiomatic Rust

---------

Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants