decomposed date_trunc optimization into expr simplifier#18648
decomposed date_trunc optimization into expr simplifier#18648drin wants to merge 1 commit intoapache:mainfrom
Conversation
|
@UBarney something I think could use some definite improvement in handling of the source expressions along transformation and failure paths (https://github.com/drin/datafusion/blob/8cba13ceafcf0df047e753f20bf54ad85a02f019/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L690-L720). I try to avoid moving until I know what to return (transformed expression or source expression), but I don't know rust/datafusion well enough to know best practices for when to clone and when to move and how to avoid either until necessary. |
This is a work-in-progress but decomposes an existing custom optimizer rule into some places in the expression simplifier that seem appropriate at first glance. This is essentially a messy code dump, but hopefully done in a way that someone with experience can appropriately integrate into the datafusion codebase.
8cba13c to
e4b2cf5
Compare
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
|
I will try to push this forward this week |
|
In theory we should be able to use the API added in |
This decomposes a custom optimizer rule into the datafusion expression simplifier (work-in-progress).
Which issue does this PR close?
Closes #18319.
Rationale for this change
To transform binary expressions that compare
date_truncwith a constant value into a form that can be better utilized (improved performance).For Bauplan, we can see the following (approximate average over a handful of runs):
Q1:
Q2:
What changes are included in this PR?
A few additional support functions and additional match arms in the simplifier match expression.
Are these changes tested?
Our custom rule has tests of the expression transformations and for correct evaluation results. These will be added to the PR after the implementation is in approximately good shape.
Are there any user-facing changes?
Better performance and occasionally confusing explain plan. In short, a
date_trunc('month', col) = '2025-12-03'::DATEwill always be false (because the truncation result can never be a non-truncated value), which may produce an unexpected expression (false).Explain plan details below (may be overkill but it was fun to figure out):
Initial query:
After simplify_expressions:
Before and after
date_trunc_optimizer(our custom rule):