Window Functions Order Conservation -- Follow-up On Set Monotonicity #14813

berkaysynnada · 2025-02-21T14:24:21Z

Which issue does this PR close?

Closes Feature: AggregateMonotonicity #14271 (comment).

Rationale for this change

#14271 had introduced set-monotonicity term for window functions. After that PR we have some false negatives for order conservation, and we don't have a fully complete test coverage for window functions in terms of set-monotonicity, partitioning type, frame type, and order of the function inputs.

What changes are included in this PR?

This PR enriches the ordering properties of WindowAggExec and BoundedWindowAggExec, and I've written a test function which covers all cases for a window function, both positive and negative cases.

Are these changes tested?

Yes

Are there any user-facing changes?

Order conservation in more cases

berkaysynnada · 2025-02-21T15:58:43Z

datafusion/core/tests/physical_optimizer/enforce_sorting.rs

@@ -222,208 +227,6 @@ async fn test_remove_unnecessary_sort5() -> Result<()> {
    Ok(())
 }

-#[tokio::test]


Coverage of these tests are preserved in test_window_partial_constant_and_set_monotonicity()

berkaysynnada · 2025-02-23T16:23:19Z

datafusion/core/tests/physical_optimizer/enforce_sorting.rs

-        "    BoundedWindowAggExec: wdw=[count: Ok(Field { name: \"count\", data_type: Int64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: Preceding(NULL), end_bound: CurrentRow, is_causal: false }], mode=[Sorted]",
-        "      SortExec: expr=[nullable_col@0 ASC, non_nullable_col@1 ASC], preserve_partitioning=[false]",
-        "        DataSourceExec: partitions=1, partition_sizes=[0]"];
+        "    SortExec: expr=[nullable_col@0 ASC, non_nullable_col@1 ASC], preserve_partitioning=[false]",


To the reviewers, please double check here. While 2nd BoundedWindowAggExec requires [nullable_col, non_nullable_col] ordering, it is not guaranteed in the previous version. If I'm not missing something, now it is, with a newly added partial sort.

To me, it seems like we had a bug before which is now fixed.

berkaysynnada · 2025-02-23T16:27:39Z

datafusion/core/tests/physical_optimizer/enforce_sorting.rs

@@ -2280,3 +2086,1265 @@ async fn test_not_replaced_with_partial_sort_for_unbounded_input() -> Result<()>
    assert_optimized!(expected_input, expected_no_change, physical_plan, true);
    Ok(())
 }
+
+#[tokio::test]
+async fn test_window_partial_constant_and_set_monotonicity() -> Result<()> {


Writing this with rs-test seems hard to debug, and the relation between the arguments and initial-final plans is not very clear. So I prefer this

berkaysynnada · 2025-02-23T16:37:56Z

datafusion/physical-plan/src/windows/mod.rs

    None
 }

+fn all_possible_sort_options(expr: Arc<dyn PhysicalExpr>) -> Vec<PhysicalSortExpr> {


This can be init once maybe

berkaysynnada · 2025-02-23T18:12:33Z

PTAL @ozankabak, @alamb

ozankabak

I reviewed both the code and the tests carefully, and sent a commit for some improvements. This PR greatly improves the test coverage and fills some gaps in terms of set monotonicity optimizations.

I will wait for some more time before merging in case we get more eyes on this (which would be great).

ozankabak · 2025-02-26T14:35:25Z

I will go ahead and merge this since it is a follow-up to a previously discussed (and reviewed) PR/feature. It would still be great to have some post-merge review on this when you have some time on your hands @alamb.

alamb · 2025-02-26T15:44:05Z

I will go ahead and merge this since it is a follow-up to a previously discussed (and reviewed) PR/feature. It would still be great to have some post-merge review on this when you have some time on your hands @alamb.

THanks -- I'll try to to look at it but I am currently quite tied up with triyng to get DataFusion 46 ready for release (and dealing with some unexpected fallout from #14224). May not be until next week

findepi · 2025-09-04T09:09:31Z

datafusion/physical-plan/src/windows/mod.rs

+            .map(|pb_order| sort_options_resolving_constant(Arc::clone(pb_order)));
+        let all_satisfied_lexs = partition_by_orders
+            .multi_cartesian_product()


sort_options_resolving_constant returns pair of values for every input expression
then multi_cartesian_product calculates all combinations when sourcing one element from each of the pairs
this produces exponential number of combinations (as a function of # input expressions). is this intentional?

Yes. This is actually a kind of workaround to represent the constancy of partitions_by columns along their partitions. So, we need to compare the existing ordering against every possible ordering of alternatives of constant columns

I understand the desire, but exponential planning time is hardly acceptable in our use-case.
For a real production query, DF 45 works in a snap, and DF 46+ never exists the planner. I had to chop off a bunch of columns from a window to get it to completion.

we need to compare the existing ordering against every possible ordering

the sort_options_resolving_constant returns only 2 options out of 4 possible.
is this correctness problem, or a missed optimization 'problem'?

define "need"

I understand the desire, but exponential planning time is hardly acceptable in our use-case. For a real production query, DF 45 works in a snap, and DF 46+ never exists the planner. I had to chop off a bunch of columns from a window to get it to completion.

The chance of skipping this complex part can be detected earlier before (for example, if there is no order requirement coming from downstream), and there wouldn't be any order calculation logic specific to window expressions.

we need to compare the existing ordering against every possible ordering

the sort_options_resolving_constant returns only 2 options out of 4 possible. is this correctness problem, or a missed optimization 'problem'?

define "need"

I checked the code, and I believe one of the three usages of sort_options_resolving_constant should be updated to generate all 4 possibilities (where it is used over partitioning expressions, not window/aggregate functions). The reason for generating only 2 of them is that set monotonicity is broken if the data has an increasing order but nulls come first, and vice versa, if the data has a decreasing order but nulls come last. So, it's not a correctness problem but a missed optimization

The chance of skipping this complex part can be detected earlier before (for example, if there is no order requirement coming from downstream),

That is simple.
However, it's likely to help with simple queries only. I.e. it will help with test queries, but more complex production workloads will still end up doing exponential (multi-minutes) planning.

We need an approach that's better than O(n²) (and obviously current O(2ⁿ) is much much worse).

From query execution perspective, those minutes spent in planning are minutes wasted, if query can be executed in seconds.

I got an idea: check satisfy ordering one by one. I mean:

partition_by_cols: [a,b,c]
ordering_req: [x,y,z]

create a new ordering from the first N(1 initially, and increase 1 by 1) elements.
compared_ordering: [x]

compare the variations of the first element of partition by [a INC NL] & [a INC NF] & [a DEC NL] & [a DEC NF]

only one of them can survive (and most of the cases, none of them, and skip the further computation)
store the surviving one, and append the next ordering element variations and partition by column, and continue comparison until either ordering or partition_by elements are over, or there isnt any survivor left

This algorithm would decrease the complexity to O(n) I guess

berkaysynnada added 11 commits February 13, 2025 16:01

case 0 dbg

a46499e

dbg case 4

ac68ef0

dbg case 9

850fd21

Update enforce_sorting.rs

7d585ed

dbg case 10-11

f304265

dbg case 19

443764c

dbg 24

349e24d

dbg 48

d13a384

final

cd7674c

Update enforce_sorting.rs

808b667

Merge branch 'apache_main' into follow-up/monotonic

d1c1b09

berkaysynnada marked this pull request as draft February 21, 2025 14:24

github-actions bot added physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate logical-expr Logical plan and expressions labels Feb 21, 2025

berkaysynnada commented Feb 21, 2025

View reviewed changes

berkaysynnada force-pushed the follow-up/monotonic branch from 54a53ed to ace28b6 Compare February 21, 2025 16:01

github-actions bot removed the logical-expr Logical plan and expressions label Feb 21, 2025

clippy

9b55465

berkaysynnada force-pushed the follow-up/monotonic branch from ace28b6 to 9b55465 Compare February 21, 2025 16:01

berkaysynnada added 2 commits February 23, 2025 19:14

fix the existing test

4bae63a

Update ordering.rs

f5e9789

berkaysynnada commented Feb 23, 2025

View reviewed changes

clean-up

38e057f

berkaysynnada marked this pull request as ready for review February 23, 2025 18:11

simplify partial constantness

0b18967

github-actions bot added catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation labels Feb 25, 2025

ozankabak force-pushed the follow-up/monotonic branch from e698fef to 8425c17 Compare February 25, 2025 21:17

ozankabak added 2 commits February 26, 2025 00:18

Review

459fd78

Merge branch 'main' into follow-up/monotonic

154a535

ozankabak approved these changes Feb 25, 2025

View reviewed changes

ozankabak merged commit ea0686b into apache:main Feb 26, 2025
24 checks passed

alamb mentioned this pull request Mar 4, 2025

March 4, 2025: This week(s) in DataFusion #15005

Closed

berkaysynnada mentioned this pull request Mar 16, 2025

Blog for DataFusion 46.0.0 #15053

Closed

findepi mentioned this pull request Sep 4, 2025

Exponential planning time when window function is partitioned by multiple columns #17401

Closed

findepi reviewed Sep 4, 2025

View reviewed changes

This was referenced Sep 16, 2025

Prevent exponential planning time for Window functions #17563

Closed

Restore window sort optimizations without exponential planning time #17624

Closed

berkaysynnada mentioned this pull request Sep 20, 2025

Prevent exponential planning time for Window functions - v2 #17684

Merged

Window Functions Order Conservation -- Follow-up On Set Monotonicity #14813

Window Functions Order Conservation -- Follow-up On Set Monotonicity #14813

Uh oh!

Conversation

berkaysynnada commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

berkaysynnada commented Feb 23, 2025

Uh oh!

ozankabak left a comment

Choose a reason for hiding this comment

Uh oh!

ozankabak commented Feb 26, 2025

Uh oh!

Uh oh!

alamb commented Feb 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

berkaysynnada Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

berkaysynnada commented Feb 21, 2025 •

edited

Loading

berkaysynnada Sep 9, 2025 •

edited

Loading