Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeNode refactor code deduplication: Part 3 #8817

Merged
merged 43 commits into from
Jan 27, 2024
Merged

TreeNode refactor code deduplication: Part 3 #8817

merged 43 commits into from
Jan 27, 2024

Conversation

ozankabak
Copy link
Contributor

@ozankabak ozankabak commented Jan 10, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

In the current implementation, maintainability and usability of the apply_children() and map_children() methods require extra effort.

The first step in addressing this was taken by @viirya, with the removal of apply_children(). He introduced the children_nodes() method as a clearer and more understandable implementation. However, a similar approach for map_children() was not feasible due to the diverse nature of dynamic objects (like ExecutionPlan and PhysicalExpr, ) and other structs that combine a plan with a state variable (such as DistributionContext).

This diversity hinders the unification of methods for accessing and updating their children. To overcome this challenge, we implemented a new trait ConcreteTreeNode, akin to DynTreeNode. The unification of map_children and apply_children seemed impossible with the current TreeNode implementation. Thus, we defined this trait to encapsulate the TreeNode. Instead of directly implementing TreeNode, one can now implement either ConcreteTreeNode for node-based state-preserving traversals, or DynTreeNode for stateless traversals, applicable to any tree-based data structure. While DynTreeNode requires two methods for implementation, ConcreteTreeNode demands three. This difference stems from the necessity to distinguish between read-only and read-write accesses in concrete types to avoid unnecessary cloning, which can be costly for heavily loaded nodes.

The ConcreteTreeNode methods are children() for read-only access, take_children() for separating the self node from its children, and with_new_children() for updating the self node with those separated and then updated children.

For rule objects, we have moved away from defining new structs with Arc<dyn ExecutionPlan>. Instead, we've implemented a new type of PlanContext with a state argument. Plans and expressions continue to implement DynTreeNode, but now they also implement ConcreteTreeNode under the wrap of PlanContext and ExprContext to store a payload. This approach is similarly applied to PhysicalExpr and ExprOrdering examples.

We have removed children_nodes from TreeNode, which was initially introduced to unify apply_children.
Additionally, unnecessary tree constructions have been eliminated in the rules to reset the tree.
Inconsistencies in the root node plan and tree traversal plan have been addressed, and tests have been enriched to ensure these issues are resolved.
Some further optimizations and code style improvements have been implemented in the rules.
With these refinements, there is no longer a need to .clone (which may be costly) treenodes.

With the changes in this PR, we make sure that TreeNode states are consistent during traversal. Hence they can be composed across different rules, sub-rules. With this capability we can write rules with more than 1 stages, where subsequent stages use payload from the previous rules. As an example consider the following use case:
Assume we want to satisfy ordering requirements of different executor in the following plan

RequiringExec(requirement=[a ASC]),
  RequiringExec(requirement=[a ASC, b ASC]),
    RequiringExec(requirement=[a ASC, b ASC, c ASC]),
      RequiringExec(requirement=[c ASC]),
        SourceExec   

Assume we have a top-down rule that collects minimum sort requirement that satisfy multiple ancestors. We would store this information similar to following struct

    pub plan: Arc<dyn ExecutionPlan>,
    pub common_sort_exprs: Vec<PhysicalSortExpr>,
    pub children: Vec<Self>,
}

This top-down pass would produce, following tree

RequiringExec(requirement=[a ASC]), , common_sort_exprs=[]
  RequiringExec(requirement=[a ASC, b ASC]), common_sort_exprs=[]
    RequiringExec(requirement=[a ASC, b ASC, c ASC]), common_sort_exprs=[a ASC, b ASC, c ASC]
      RequiringExec(requirement=[c ASC]), , common_sort_exprs=[c ASC]
        SourceExec  

Then, using this tree with a bottom-up pass we can satisfy requirements by inserting SortExecs to non-empty common_sort_exprs.

Bottom-up pass would produce following plan tree

RequiringExec(requirement=[a ASC]),
  RequiringExec(requirement=[a ASC, b ASC])
    RequiringExec(requirement=[a ASC, b ASC, c ASC])
      SortExec(expr=[a ASC, b ASC, c ASC])
        RequiringExec(requirement=[c ASC]),
          SortExec(expr=[c ASC])
            SourceExec   

This composable structure makes it easy to use previous tree results, in subsequent stages.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions core Core DataFusion crate labels Jan 10, 2024
@ozankabak
Copy link
Contributor Author

TODO

  • Calls to add_sort_above should change PlanContext tree accordingly.
  • pushdown_sorts: Has an unnecessary SortPushdown::new_default, should go away when add_sort_above is fixed.
  • reorder_partitioned_join_keys: Changes plan, should take in PlanWithKeyRequirements and modify it (and thus remove new_default calls).
  • reorder_aggregate_keys -> Changes plan, should take in PlanWithKeyRequirements and modify it (and thus remove new_default calls).
  • remove_corresponding_coalesce_in_sub_plan: Changes plan, should modify the PlanWithCorrespondingCoalescePartitions tree it takes in.
    • This way we will get rid of the new_default calls in parallelize_sorts.

TEST TODO

  • Tests should validate derived trees match with original trees
  • Tests should validate data/payloads are set correctly after transform completes (to ensure composability)

@berkaysynnada
Copy link
Contributor

I will work on TODO's.

@@ -18,7 +18,6 @@
//! This module provides common traits for visiting or rewriting tree
//! data structures easily.

use std::borrow::Cow;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this PR reverts @viirya's #8672. Why is that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya's PR was a good step towards deduplicating the apply_children() implementations. This PR further deduplicates not only apply_children(), but also map_children(), bringing them together in one place.

Like the children() method introduced by @viirya, which returns a Cow, map_children() requires two different methods: one to separate the children from self for mutation, and another for reattaching them afterwards. Since obtaining ownership of the children without owning self is not possible, I think Cow's purpose becomes not much relevant in this use case. Now we have two children related methods: one with a reference and one with ownership. The original fn children(&self) -> Vec<Cow<Self>> can still be used, but it has become redundant. If the user requires write access, take_children() should be used.

Rather than implementing functions such as apply_children() and map_children(), which provide too much flexibility and are prone to design an unclear flow, I think it is more appropriate to implement these 3 methods, which express the relationships of self nodes with their children, both in terms of ease of use and simplicity.

Copy link
Contributor

@peter-toth peter-toth Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you check apply_children(), it seems like you duplicate its logic to 4 places in this PR.
Also, the Cow was there to avoid cloning in Expr but now we have cloning again.
I'm ok with the new children(&self) and take_children(self) methods, but it seems like they are not properly used here and there...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you check apply_children(), it seems like you duplicate its logic to 4 places in this PR.\

Right. The point is that there is a fixed number of apply_children implementations (one doesn't need to implement it per rule or per use case). However, 1 is obviously better than 4 and we will iterate on this.

Also, the Cow was there to avoid cloning in Expr but now we have cloning again.

Yes, and I am looking at ways to eliminate it. This is the only use case remaining forCow, and there may be a good way to remove it and still avoid cloning. The strategy in the WIP stage is to first explicitly implement everything and then progressively deduplicate by refining design. Maybe we will converge to the same Cow design, in that case we will simply revert.

Thanks for taking a look 🚀

@alamb
Copy link
Contributor

alamb commented Jan 19, 2024

Related discussion / epic: #8913

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 23, 2024
@github-actions github-actions bot added optimizer Optimizer rules and removed optimizer Optimizer rules labels Jan 24, 2024
@ozankabak ozankabak marked this pull request as ready for review January 24, 2024 10:02
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Jan 24, 2024
Copy link
Contributor

@mustafasrepo mustafasrepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is LGTM!. It helps us to use existing tree nodes in different rules. Also, common implementations for tree nodes are moved to trait implementation.

@alamb
Copy link
Contributor

alamb commented Jan 24, 2024

Is this PR ready for review? I ask because the title still says WIP but maybe that is just outdated.

@ozankabak ozankabak changed the title [WIP] TreeNode refactor code deduplication: Part 3 TreeNode refactor code deduplication: Part 3 Jan 24, 2024
@ozankabak
Copy link
Contributor Author

Is this PR ready for review? I ask because the title still says WIP but maybe that is just outdated.

Yes, I forgot to change it

@peter-toth
Copy link
Contributor

I kept my eye on this PR and IMO it looks good (my concers have been addressed) and actually a nice refactor that unifies many different derived trees into PlanContext and ExprContext.

Also, thanks for the general multi-stage example @ozankabak. Once this PR gets merged I would rebase my #8664 to see if it can improve some of the transformations while also simplify the tansform logic.

@alamb
Copy link
Contributor

alamb commented Jan 24, 2024

I plan to review this shortly

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really really nice work @ozankabak and @berkaysynnada 👏 I think it is pretty amazing and shows a real mastery of the code

giphy

Thank you @peter-toth for helping to review as well as inspiring some of this work and reimagining.

I ran our internal test suite from influxdb_iox against this branch and it passed 🚀 (and didn't require any code changes to comple )

I left some remarks mostly about improving / increasing comments, but I think they could be done as follow on PRs (or never)

cc @yahoNanJing @liukun4515

I recommend leaving this PR open for another few days (or longer if anyone says they would like longer to review) and then merging it in

use std::sync::Arc;

use crate::Result;

#[macro_export]
macro_rules! handle_tree_recursion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can document what this macro does (aka returns from the function if Skip or Stop)

Comment on lines 50 to 54
/// Use preorder to iterate the node on the tree so that we can
/// stop fast for some cases.
///
/// The `op` closure can be used to collect some info from the
/// tree node or do some checking for the tree node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this documentation can be updated, something like

Suggested change
/// Use preorder to iterate the node on the tree so that we can
/// stop fast for some cases.
///
/// The `op` closure can be used to collect some info from the
/// tree node or do some checking for the tree node.
/// Applies `op` to the node and its children, with traversal controlled by
/// [`VisitRecursion`]
///
/// The `op` closure can be used to collect some info from the
/// tree node or do some checking for the tree node.

} else {
Ok(self)
}
}
}

pub trait ConcreteTreeNode: Sized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should document this trait, focusing what it should be used for, perhaps using some of the great description from this PR?

I think documenting what the functions do and their expectations, take_children in particular, would be valuable as well

@@ -35,3 +38,58 @@ impl DynTreeNode for dyn ExecutionPlan {
with_new_children_if_necessary(arc_self, new_children).map(Transformed::into)
}
}

#[derive(Debug)]
pub struct PlanContext<T: Sized> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also document what this is used for? It seems it is used for rewriting execution plans where some additional information needs to be stored in addition to the plan itself

Maybe a more descriptive name would be PlanAndContext? Though perhaps that is too verbose

@@ -35,3 +38,57 @@ impl DynTreeNode for dyn PhysicalExpr {
with_new_children_if_necessary(arc_self, new_children)
}
}

#[derive(Debug)]
pub struct ExprContext<T: Sized> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise documentation here would be awesome

@@ -361,3 +364,17 @@ pub fn sort_exec(
let sort_exprs = sort_exprs.into_iter().collect();
Arc::new(SortExec::new(sort_exprs, input))
}

pub fn check_integrity<T: Clone>(context: PlanContext<T>) -> Result<PlanContext<T>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document what this is doing?

let children_plans = node.plan.children();
assert_eq!(node.children.len(), children_plans.len());
for (child_plan, child_node) in children_plans.iter().zip(node.children.iter()) {
assert_eq!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you instead simply compare the plans directly rather than converting them to strings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ExecutionPlan's do not implement PartialEq, we lack a straightforward way to compare them. However, I believe there are no obstacles to implementing PartialEq support for ExecutionPlan's.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense -- maybe that is something that can be embedded in a comment to help anyone else that may have the same question

children_nodes: Vec<Self>,
}

impl DistributionContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is really cool to see this same pattern extracted out

let adjusted = plan_requirements
.transform_up(&ensure_sorting)
.and_then(check_integrity)?;
// TODO: End state payloads will be checked here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these still TODO items? Did you intent to complete the as part of this PR? If not, perhaps we can file a ticket to track their completion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding payload checks to each test is time-consuming, and verifying the accuracy of the results necessitates a separate analysis. I suggest we track these tasks in a separate ticket as todo, so we don't delay this PR any further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate ticket is a great idea -- can you please file one and add the link to these comments?

let state = pipeline
.transform_up(&|p| apply_subrules(p, &subrules, &ConfigOptions::new()))
.and_then(check_integrity)?;
// TODO: End state payloads will be checked here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still TODO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the other todo items

@berkaysynnada
Copy link
Contributor

I left some remarks mostly about improving / increasing comments, but I think they could be done as follow on PRs (or never)

Thank you @alamb for your valuable feedbacks. I will push a commit addressing your suggestions.

@ozankabak ozankabak merged commit 9c4affe into apache:main Jan 27, 2024
22 checks passed
@ozankabak
Copy link
Contributor Author

Thanks everyone!

@alamb
Copy link
Contributor

alamb commented Jan 28, 2024

Epic work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants