1212// specific language governing permissions and limitations
1313// under the License.
1414
15- //! Push Down Filter optimizer rule ensures that filters are applied as early as possible in the plan
15+ //! [`PushDownFilter`] Moves filters so they are applied as early as possible in
16+ //! the plan.
1617
1718use crate :: optimizer:: ApplyOrder ;
1819use crate :: utils:: { conjunction, split_conjunction, split_conjunction_owned} ;
@@ -33,31 +34,93 @@ use itertools::Itertools;
3334use std:: collections:: { HashMap , HashSet } ;
3435use std:: sync:: Arc ;
3536
36- /// Push Down Filter optimizer rule pushes filter clauses down the plan
37+ /// Optimizer rule for pushing (moving) filter expressions down in a plan so
38+ /// they are applied as early as possible.
39+ ///
3740/// # Introduction
38- /// A filter-commutative operation is an operation whose result of filter(op(data)) = op(filter(data)).
39- /// An example of a filter-commutative operation is a projection; a counter-example is `limit`.
4041///
41- /// The filter-commutative property is column-specific. An aggregate grouped by A on SUM(B)
42- /// can commute with a filter that depends on A only, but does not commute with a filter that depends
43- /// on SUM(B).
42+ /// The goal of this rule is to improve query performance by eliminating
43+ /// redundant work.
44+ ///
45+ /// For example, given a plan that sorts all values where `a > 10`:
46+ ///
47+ /// ```text
48+ /// Filter (a > 10)
49+ /// Sort (a, b)
50+ /// ```
51+ ///
52+ /// A better plan is to filter the data *before* the Sort, which sorts fewer
53+ /// rows and therefore does less work overall:
54+ ///
55+ /// ```text
56+ /// Sort (a, b)
57+ /// Filter (a > 10) <-- Filter is moved before the sort
58+ /// ```
59+ ///
60+ /// However it is not always possible to push filters down. For example, given a
61+ /// plan that finds the top 3 values and then keeps only those that are greater
62+ /// than 10, if the filter is pushed below the limit it would produce a
63+ /// different result.
64+ ///
65+ /// ```text
66+ /// Filter (a > 10) <-- can not move this Filter before the limit
67+ /// Limit (fetch=3)
68+ /// Sort (a, b)
69+ /// ```
70+ ///
71+ ///
72+ /// More formally, a filter-commutative operation is an operation `op` that
73+ /// satisfies `filter(op(data)) = op(filter(data))`.
74+ ///
75+ /// The filter-commutative property is plan and column-specific. A filter on `a`
76+ /// can be pushed through a `Aggregate(group_by = [a], agg=[SUM(b))`. However, a
77+ /// filter on `SUM(b)` can not be pushed through the same aggregate.
78+ ///
79+ /// # Handling Conjunctions
80+ ///
81+ /// It is possible to only push down **part** of a filter expression if is
82+ /// connected with `AND`s (more formally if it is a "conjunction").
83+ ///
84+ /// For example, given the following plan:
85+ ///
86+ /// ```text
87+ /// Filter(a > 10 AND SUM(b) < 5)
88+ /// Aggregate(group_by = [a], agg = [SUM(b))
89+ /// ```
90+ ///
91+ /// The `a > 10` is commutative with the `Aggregate` but `SUM(b) < 5` is not.
92+ /// Therefore it is possible to only push part of the expression, resulting in:
93+ ///
94+ /// ```text
95+ /// Filter(SUM(b) < 5)
96+ /// Aggregate(group_by = [a], agg = [SUM(b))
97+ /// Filter(a > 10)
98+ /// ```
99+ ///
100+ /// # Handling Column Aliases
44101///
45- /// This optimizer commutes filters with filter-commutative operations to push the filters
46- /// the closest possible to the scans, re-writing the filter expressions by every
47- /// projection that changes the filter's expression.
102+ /// This optimizer must sometimes handle re-writing filter expressions when they
103+ /// pushed, for example if there is a projection that aliases `a+1` to `"b"`:
48104///
49- /// Filter: b Gt Int64(10)
50- /// Projection: a AS b
105+ /// ```text
106+ /// Filter (b > 10)
107+ /// Projection: [a+1 AS "b"] <-- changes the name of `a+1` to `b`
108+ /// ```
51109///
52- /// is optimized to
110+ /// To apply the filter prior to the `Projection`, all references to `b` must be
111+ /// rewritten to `a+1`:
53112///
54- /// Projection: a AS b
55- /// Filter: a Gt Int64(10) <--- changed from b to a
113+ /// ```text
114+ /// Projection: a AS "b"
115+ /// Filter: (a + 1 > 10) <--- changed from b to a + 1
116+ /// ```
117+ /// # Implementation Notes
56118///
57- /// This performs a single pass through the plan. When it passes through a filter, it stores that filter,
58- /// and when it reaches a node that does not commute with it, it adds the filter to that place.
59- /// When it passes through a projection, it re-writes the filter's expression taking into account that projection.
60- /// When multiple filters would have been written, it `AND` their expressions into a single expression.
119+ /// This implementation performs a single pass through the plan, "pushing" down
120+ /// filters. When it passes through a filter, it stores that filter, and when it
121+ /// reaches a plan node that does not commute with that filter, it adds the
122+ /// filter to that place. When it passes through a projection, it re-writes the
123+ /// filter's expression taking into account that projection.
61124#[ derive( Default ) ]
62125pub struct PushDownFilter { }
63126
0 commit comments