-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup TreeNode implementations #8672
Changes from 4 commits
2da06d5
f725553
e05123a
56c3919
ee4c9df
d4eab3a
3b8a7ad
2958714
26736bb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,6 +33,9 @@ use crate::Result; | |
/// [`LogicalPlan`]: https://docs.rs/datafusion-expr/latest/datafusion_expr/logical_plan/enum.LogicalPlan.html | ||
/// [`Expr`]: https://docs.rs/datafusion-expr/latest/datafusion_expr/expr/enum.Expr.html | ||
pub trait TreeNode: Sized { | ||
/// Returns all children of the TreeNode | ||
fn children_nodes(&self) -> Vec<Self>; | ||
|
||
/// Use preorder to iterate the node on the tree so that we can | ||
/// stop fast for some cases. | ||
/// | ||
|
@@ -211,7 +214,17 @@ pub trait TreeNode: Sized { | |
/// Apply the closure `F` to the node's children | ||
fn apply_children<F>(&self, op: &mut F) -> Result<VisitRecursion> | ||
where | ||
F: FnMut(&Self) -> Result<VisitRecursion>; | ||
F: FnMut(&Self) -> Result<VisitRecursion>, | ||
{ | ||
for child in self.children_nodes() { | ||
match op(&child)? { | ||
VisitRecursion::Continue => {} | ||
VisitRecursion::Skip => return Ok(VisitRecursion::Continue), | ||
VisitRecursion::Stop => return Ok(VisitRecursion::Stop), | ||
} | ||
} | ||
Ok(VisitRecursion::Continue) | ||
} | ||
|
||
/// Apply transform `F` to the node's children, the transform `F` might have a direction(Preorder or Postorder) | ||
fn map_children<F>(self, transform: F) -> Result<Self> | ||
|
@@ -342,18 +355,21 @@ pub trait DynTreeNode { | |
/// Blanket implementation for Arc for any tye that implements | ||
/// [`DynTreeNode`] (such as [`Arc<dyn PhysicalExpr>`]) | ||
impl<T: DynTreeNode + ?Sized> TreeNode for Arc<T> { | ||
fn children_nodes(&self) -> Vec<Arc<T>> { | ||
self.arc_children() | ||
} | ||
|
||
fn apply_children<F>(&self, op: &mut F) -> Result<VisitRecursion> | ||
where | ||
F: FnMut(&Self) -> Result<VisitRecursion>, | ||
{ | ||
for child in self.arc_children() { | ||
match op(&child)? { | ||
for child in &self.arc_children() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need this 2nd There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, missed this. This can be removed too. |
||
match op(child)? { | ||
VisitRecursion::Continue => {} | ||
VisitRecursion::Skip => return Ok(VisitRecursion::Continue), | ||
VisitRecursion::Stop => return Ok(VisitRecursion::Stop), | ||
} | ||
} | ||
|
||
Ok(VisitRecursion::Continue) | ||
} | ||
|
||
|
@@ -368,7 +384,7 @@ impl<T: DynTreeNode + ?Sized> TreeNode for Arc<T> { | |
let arc_self = Arc::clone(&self); | ||
self.with_new_arc_children(arc_self, new_children?) | ||
} else { | ||
Ok(self) | ||
Ok(self.clone()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just one minor note, that this clone seems to be unnecessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, probably I added it during trying different ideas locally. Removed. |
||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,16 +24,13 @@ use crate::expr::{ | |
}; | ||
use crate::{Expr, GetFieldAccess}; | ||
|
||
use datafusion_common::tree_node::{TreeNode, VisitRecursion}; | ||
use datafusion_common::tree_node::TreeNode; | ||
use datafusion_common::{internal_err, DataFusionError, Result}; | ||
|
||
impl TreeNode for Expr { | ||
fn apply_children<F>(&self, op: &mut F) -> Result<VisitRecursion> | ||
where | ||
F: FnMut(&Self) -> Result<VisitRecursion>, | ||
{ | ||
let children = match self { | ||
Expr::Alias(Alias{expr,..}) | ||
fn children_nodes(&self) -> Vec<Self> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This PR is a nice refactor but I'm not sure it is needed at all (1.) and I'm also not sure cloning a node's children into a
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I mentioned in above comment, the first try I did is to return But there is one trouble when implement This can be seen as an incremental cleanup patch as it doesn't change current API and keep current behavior but clean up the code as its purpose. In the long term I'd like to see more fundamental change to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it make sense to return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sometimes there may be cases where one needs to do multiple passes over the same tree (e.g. doing one top down and one bottom up pass consecutively). Or we may want to run different logic on the same tree. So it may be a little premature to conclude we can do away with "derived" trees and just keep the base four. I suggest we take small steps and progressively discover the requirements/use cases. I think refactoring to get rid of duplicate apply_children and map_children is a good first step. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, that's what I thought so this restrains to be limited change (mostly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is also supported with
I'm happy to try refactoring the remaining 4 ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I investigated the |
||
match self { | ||
Expr::Alias(Alias { expr, .. }) | ||
| Expr::Not(expr) | ||
| Expr::IsNotNull(expr) | ||
| Expr::IsTrue(expr) | ||
|
@@ -47,26 +44,24 @@ impl TreeNode for Expr { | |
| Expr::Cast(Cast { expr, .. }) | ||
| Expr::TryCast(TryCast { expr, .. }) | ||
| Expr::Sort(Sort { expr, .. }) | ||
| Expr::InSubquery(InSubquery{ expr, .. }) => vec![expr.as_ref().clone()], | ||
| Expr::InSubquery(InSubquery { expr, .. }) => vec![expr.as_ref().clone()], | ||
Expr::GetIndexedField(GetIndexedField { expr, field }) => { | ||
let expr = expr.as_ref().clone(); | ||
match field { | ||
GetFieldAccess::ListIndex {key} => { | ||
GetFieldAccess::ListIndex { key } => { | ||
vec![key.as_ref().clone(), expr] | ||
}, | ||
GetFieldAccess::ListRange {start, stop} => { | ||
} | ||
GetFieldAccess::ListRange { start, stop } => { | ||
vec![start.as_ref().clone(), stop.as_ref().clone(), expr] | ||
} | ||
GetFieldAccess::NamedStructField {name: _name} => { | ||
GetFieldAccess::NamedStructField { name: _name } => { | ||
vec![expr] | ||
} | ||
} | ||
} | ||
Expr::GroupingSet(GroupingSet::Rollup(exprs)) | ||
| Expr::GroupingSet(GroupingSet::Cube(exprs)) => exprs.clone(), | ||
Expr::ScalarFunction (ScalarFunction{ args, .. } ) => { | ||
args.clone() | ||
} | ||
Expr::ScalarFunction(ScalarFunction { args, .. }) => args.clone(), | ||
Expr::GroupingSet(GroupingSet::GroupingSets(lists_of_exprs)) => { | ||
lists_of_exprs.clone().into_iter().flatten().collect() | ||
} | ||
|
@@ -77,8 +72,8 @@ impl TreeNode for Expr { | |
| Expr::Literal(_) | ||
| Expr::Exists { .. } | ||
| Expr::ScalarSubquery(_) | ||
| Expr::Wildcard {..} | ||
| Expr::Placeholder (_) => vec![], | ||
| Expr::Wildcard { .. } | ||
| Expr::Placeholder(_) => vec![], | ||
Expr::BinaryExpr(BinaryExpr { left, right, .. }) => { | ||
vec![left.as_ref().clone(), right.as_ref().clone()] | ||
} | ||
|
@@ -107,8 +102,12 @@ impl TreeNode for Expr { | |
} | ||
expr_vec | ||
} | ||
Expr::AggregateFunction(AggregateFunction { args, filter, order_by, .. }) | ||
=> { | ||
Expr::AggregateFunction(AggregateFunction { | ||
args, | ||
filter, | ||
order_by, | ||
.. | ||
}) => { | ||
let mut expr_vec = args.clone(); | ||
|
||
if let Some(f) = filter { | ||
|
@@ -137,17 +136,7 @@ impl TreeNode for Expr { | |
expr_vec.extend(list.clone()); | ||
expr_vec | ||
} | ||
}; | ||
|
||
for child in children.iter() { | ||
match op(child)? { | ||
VisitRecursion::Continue => {} | ||
VisitRecursion::Skip => return Ok(VisitRecursion::Continue), | ||
VisitRecursion::Stop => return Ok(VisitRecursion::Stop), | ||
} | ||
} | ||
|
||
Ok(VisitRecursion::Continue) | ||
} | ||
|
||
fn map_children<F>(self, transform: F) -> Result<Self> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First tried with
Vec<&Self>
: https://github.com/apache/arrow-datafusion/compare/main...viirya:arrow-datafusion:refactor_treenode?expand=1.Most cases it works well. But one trouble is the
TreeNode
implementation forArc<T>
:So changed to
Vec<Self>
.