Add support for Plan Properties caching #1

mustafasrepo · 2024-02-14T13:59:54Z

Which issue does this PR close?

Closes #.

Rationale for this change

In great analysis by [MENTION NAME] at the issue 9084. [MENTION NAME] recognized that stack usage (depth) increases a lot during logical and physical planning. The root cause of aggressive stack usage in the logical planning is excessive use of .clone of LogicalPlan enum.

In physical planning the problem stems from recursive function calls in the getter APIs of the Arc<dyn ExecutionPlan>, such as EquivalenceProperties, output_partitioning, output_ordering, etc.

In the PR9084, [MENTION NAME] could reduce physical plan stack usage by caching equivalence_properties for ProjectionExec.

This PR introduces a new struct to cache PlanProperties (PlanPropertiesCache). This this struct, schema, output_partitioning, equivalence_properties, output_ordering is cached. This removes recursive calls during getter methods. Also, given .cache method is implemented, default implementations of the .output_partitioning, .equivalence_properties, output_ordering works.

With these changes flame graph for the query 54 convert from following graph
flamegraph_main_q54.svg.zip (github couldn't upload it, hence loaded as .zip)

to following graph

where stack usage decreases.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

api change

…aTypes) (apache#8985) * ScalarValue return types from argument values * change file name * try using ?Sized * use Ok * move method default impl outside trait * Use type trait for ExprSchemable * fix nit * Proposed Return Type from Expr suggestions (#1) * Improve return_type_from_args * Rework example * Update datafusion/core/tests/user_defined/user_defined_scalar_functions.rs --------- Co-authored-by: Junhao Liu <junhaoliu2023@gmail.com> * Apply suggestions from code review Co-authored-by: Alex Huang <huangweijun1001@gmail.com> * Fix tests + clippy * rework types to use dyn trait * fmt * docs * Apply suggestions from code review Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * Add docs explaining what happens when both `return_type` and `return_type_from_exprs` are called * clippy * fix doc -- comedy of errors --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Alex Huang <huangweijun1001@gmail.com> Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

# Conflicts: # datafusion/physical-plan/src/streaming.rs

# Conflicts: # datafusion/physical-plan/src/projection.rs

ozankabak · 2024-02-29T00:18:52Z

Merged upstream.

* refactor `TreeNode::rewrite()` * use handle_tree_recursion in `Expr` * use macro for transform recursions * fix api * minor fixes * fix * don't trust `t.transformed` coming from transformation closures, keep the old way of detecting if changes were made * rephrase todo comment, always propagate up `t.transformed` from the transformation closure, fix projection pushdown closure * Fix `TreeNodeRecursion` docs * extend Skip (Prune) functionality to Jump as it is defined in https://synnada.notion.site/synnada/TreeNode-Design-Proposal-bceac27d18504a2085145550e267c4c1 * fix Jump and add tests * jump test fixes * fix clippy * unify "transform" traversals using macros, fix "visit" traversal jumps, add visit jump tests, ensure consistent naming `f` instead of `op`, `f_down` instead of `pre_visit` and `f_up` instead of `post_visit` * fix macro rewrite * minor fixes * minor fix * refactor tests * add transform tests * add apply, transform_down and transform_up tests * refactor tests * test jump on both a and e nodes in both top-down and bottom-up traversals * better transform/rewrite tests * minor fix * simplify tests * add stop tests, reorganize tests * fix previous merges and remove leftover file * Review TreeNode Refactor (#1) * Minor changes * Jump doesn't ignore f_up * update test * Update rewriter * LogicalPlan visit update and propagate from children flags * Update tree_node.rs * Update map_children's --------- Co-authored-by: Mustafa Akur <mustafa.akur@synnada.ai> * fix * minor fixes * fix f_up call when f_down returns jump * simplify code * minor fix * revert unnecessary changes * fix `DynTreeNode` and `ConcreteTreeNode` `transformed` and `tnr` propagation * introduce TransformedResult helper * fix docs * restore transform as alias to trassform_up * restore transform as alias to trassform_up 2 * Simplifications and comment improvements (#2) --------- Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com> Co-authored-by: Mustafa Akur <mustafa.akur@synnada.ai> Co-authored-by: Mehmet Ozan Kabak <ozankabak@gmail.com>

mustafasrepo added 2 commits February 14, 2024 16:57

Initial commit

fe0b1fe

Update comments

27b6805

github-actions bot added core sqllogictest labels Feb 15, 2024

mustafasrepo and others added 2 commits February 16, 2024 16:58

Merge branch 'main' into feature/properties_caching

8d8cb8b

# Conflicts: # datafusion/physical-plan/src/streaming.rs

Review Part 1

c8cece8

github-actions bot added the physical-expr label Feb 21, 2024

mustafasrepo added 2 commits February 21, 2024 17:38

Merge branch 'main' into feature/properties_caching

01eaf45

# Conflicts: # datafusion/physical-plan/src/projection.rs

Minor changes

93f5282

ozankabak closed this Feb 29, 2024

mustafasrepo deleted the feature/properties_caching branch March 27, 2024 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Plan Properties caching #1

Add support for Plan Properties caching #1

mustafasrepo commented Feb 14, 2024 •

edited

Loading

ozankabak commented Feb 29, 2024

Add support for Plan Properties caching #1

Add support for Plan Properties caching #1

Conversation

mustafasrepo commented Feb 14, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

ozankabak commented Feb 29, 2024

mustafasrepo commented Feb 14, 2024 •

edited

Loading