Skip to content

Minor: Document LogicalPlan tree node transformations #10010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions datafusion/core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -296,11 +296,15 @@
//! A [`LogicalPlan`] is a Directed Acyclic Graph (DAG) of other
//! [`LogicalPlan`]s, each potentially containing embedded [`Expr`]s.
//!
//! [`Expr`]s can be rewritten using the [`TreeNode`] API and simplified using
//! [`ExprSimplifier`]. Examples of working with and executing `Expr`s can be found in the
//! [`expr_api`.rs] example
//! `LogicalPlan`s can be rewritten with [`TreeNode`] API, see the
//! [`tree_node module`] for more details.
//!
//! [`Expr`]s can also be rewritten with [`TreeNode`] API and simplified using
//! [`ExprSimplifier`]. Examples of working with and executing `Expr`s can be
//! found in the [`expr_api`.rs] example
//!
//! [`TreeNode`]: datafusion_common::tree_node::TreeNode
//! [`tree_node module`]: datafusion_expr::logical_plan::tree_node
//! [`ExprSimplifier`]: crate::optimizer::simplify_expressions::ExprSimplifier
//! [`expr_api`.rs]: https://github.com/apache/arrow-datafusion/blob/main/datafusion-examples/examples/expr_api.rs
//!
Expand Down
2 changes: 1 addition & 1 deletion datafusion/expr/src/logical_plan/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ pub mod dml;
mod extension;
mod plan;
mod statement;
mod tree_node;
pub mod tree_node;

pub use builder::{
build_join_schema, table_scan, union, wrap_projection_for_join_if_necessary,
Expand Down
20 changes: 19 additions & 1 deletion datafusion/expr/src/logical_plan/plan.rs
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,11 @@ pub use datafusion_common::{JoinConstraint, JoinType};
/// an output relation (table) with a (potentially) different
/// schema. A plan represents a dataflow tree where data flows
/// from leaves up to the root to produce the query result.
///
/// # See also:
/// * [`tree_node`]: visiting and rewriting API
///
/// [`tree_node`]: crate::logical_plan::tree_node
#[derive(Clone, PartialEq, Eq, Hash)]
pub enum LogicalPlan {
/// Evaluates an arbitrary list of expressions (essentially a
Expand Down Expand Up @@ -238,7 +243,10 @@ impl LogicalPlan {
}

/// Returns all expressions (non-recursively) evaluated by the current
/// logical plan node. This does not include expressions in any children
/// logical plan node. This does not include expressions in any children.
///
/// Note this method `clone`s all the expressions. When possible, the
/// [`tree_node`] API should be used instead of this API.
///
/// The returned expressions do not necessarily represent or even
/// contributed to the output schema of this node. For example,
Expand All @@ -248,6 +256,8 @@ impl LogicalPlan {
/// The expressions do contain all the columns that are used by this plan,
/// so if there are columns not referenced by these expressions then
/// DataFusion's optimizer attempts to optimize them away.
///
/// [`tree_node`]: crate::logical_plan::tree_node
pub fn expressions(self: &LogicalPlan) -> Vec<Expr> {
let mut exprs = vec![];
self.apply_expressions(|e| {
Expand Down Expand Up @@ -773,10 +783,16 @@ impl LogicalPlan {
/// Returns a new `LogicalPlan` based on `self` with inputs and
/// expressions replaced.
///
/// Note this method creates an entirely new node, which requires a large
/// amount of clone'ing. When possible, the [`tree_node`] API should be used
/// instead of this API.
///
/// The exprs correspond to the same order of expressions returned
/// by [`Self::expressions`]. This function is used by optimizers
/// to rewrite plans using the following pattern:
///
/// [`tree_node`]: crate::logical_plan::tree_node
///
/// ```text
/// let new_inputs = optimize_children(..., plan, props);
///
Expand Down Expand Up @@ -1367,6 +1383,7 @@ macro_rules! handle_transform_recursion_up {
}

impl LogicalPlan {
/// Visits a plan similarly to [`Self::visit`], but including embedded subqueries.
pub fn visit_with_subqueries<V: TreeNodeVisitor<Node = Self>>(
&self,
visitor: &mut V,
Expand All @@ -1380,6 +1397,7 @@ impl LogicalPlan {
.visit_parent(|| visitor.f_up(self))
}

/// Rewrites a plan similarly t [`Self::visit`], but including embedded subqueries.
pub fn rewrite_with_subqueries<R: TreeNodeRewriter<Node = Self>>(
self,
rewriter: &mut R,
Expand Down
32 changes: 25 additions & 7 deletions datafusion/expr/src/logical_plan/tree_node.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,35 @@
// specific language governing permissions and limitations
// under the License.

//! Tree node implementation for logical plan

//! [`TreeNode`] based visiting and rewriting for [`LogicalPlan`]s
//!
//! Visiting (read only) APIs
//! * [`LogicalPlan::visit`]: recursively visit the node and all of its inputs
//! * [`LogicalPlan::visit_with_subqueries`]: recursively visit the node and all of its inputs, including subqueries
//! * [`LogicalPlan::apply_children`]: recursively visit all inputs of this node
//! * [`LogicalPlan::apply_expressions`]: (non recursively) visit all expressions of this node
//! * [`LogicalPlan::apply_subqueries`]: (non recursively) visit all subqueries of this node
//! * [`LogicalPlan::apply_with_subqueries`]: recursively visit all inputs and embedded subqueries.
//!
//! Rewriting (update) APIs:
//! * [`LogicalPlan::exists`]: search for an expression in a plan
//! * [`LogicalPlan::rewrite`]: recursively rewrite the node and all of its inputs
//! * [`LogicalPlan::map_children`]: recursively rewrite all inputs of this node
//! * [`LogicalPlan::map_expressions`]: (non recursively) visit all expressions of this node
//! * [`LogicalPlan::map_subqueries`]: (non recursively) rewrite all subqueries of this node
//! * [`LogicalPlan::rewrite_with_subqueries`]: recursively rewrite the node and all of its inputs, including subqueries
//!
//! (Re)creation APIs (these require substantial cloning and thus are slow):
//! * [`LogicalPlan::with_new_exprs`]: Create a new plan with different expressions
//! * [`LogicalPlan::expressions`]: Return a copy of the plan's expressions
use crate::{
Aggregate, Analyze, CreateMemoryTable, CreateView, CrossJoin, DdlStatement, Distinct,
DistinctOn, DmlStatement, Explain, Extension, Filter, Join, Limit, LogicalPlan,
Prepare, Projection, RecursiveQuery, Repartition, Sort, Subquery, SubqueryAlias,
Union, Unnest, Window,
dml::CopyTo, Aggregate, Analyze, CreateMemoryTable, CreateView, CrossJoin,
DdlStatement, Distinct, DistinctOn, DmlStatement, Explain, Extension, Filter, Join,
Limit, LogicalPlan, Prepare, Projection, RecursiveQuery, Repartition, Sort, Subquery,
SubqueryAlias, Union, Unnest, Window,
};
use std::sync::Arc;

use crate::dml::CopyTo;
use datafusion_common::tree_node::{
Transformed, TreeNode, TreeNodeIterator, TreeNodeRecursion,
};
Expand Down