Prevent exponential planning time for Window functions - v2 #17684

berkaysynnada · 2025-09-20T11:43:21Z

Which issue does this PR close?

Closes Exponential planning time when window function is partitioned by multiple columns #17401.

Rationale for this change

There is a complexity issue with window ordering calculations. Rather than computing all possibilities and eliminating the failing ones, we incrementally refine the ordering, keeping only elements that satisfy requirements.

What changes are included in this PR?

This PR implements the algorithm mentioned in the issue. There is also one minor change: the sort_options_resolving_constant function now generates candidates according to the intended usage. This enables a minor optimization possibility.

Are these changes tested?

Are there any user-facing changes?

berkaysynnada · 2025-09-20T11:46:50Z

@findepi can you please review this? I've implemented something similar to what I'd in my mind, and I hope it will solve your problems without sacrificing anything, I don't know how to measure the improvement on my computer though

berkaysynnada · 2025-09-20T11:48:13Z

cc @alamb also, if you are interested in this and want to review

alamb · 2025-09-20T12:58:12Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix/exponential-window-ordering-calc (43f7592) to 03b6789 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=physical_window_function_partition
BENCH_BRANCH_NAME=fix_exponential-window-ordering-calc
Results will be posted here when complete

berkaysynnada · 2025-09-20T13:01:05Z

@findepi do you want me to take bench and slt changes from #17563 ?

comphead

Thanks @berkaysynnada nice to see benches though

findepi · 2025-09-21T09:14:31Z

@findepi do you want me to take bench and slt changes from #17563 ?

yes, please

findepi

This doesn't solve the original problem.

I checked out this PR and run cargo run --bin datafusion-cli

Partitioning over 4 columns

DataFusion CLI v50.0.0
> WITH source AS (
    SELECT
        1 AS n,
        '' AS a1, '' AS a2, '' AS a3, '' AS a4, '' AS a5, '' AS a6, '' AS a7, '' AS a8,
        '' AS a9, '' AS a10, '' AS a11, '' AS a12, '' AS a13, '' AS a14, '' AS a15, '' AS a16,
        '' AS a17, '' AS a18, '' AS a19, '' AS a20, '' AS a21, '' AS a22, '' AS a23, '' AS a24,
        '' AS a25, '' AS a26, '' AS a27, '' AS a28, '' AS a29, '' AS a30, '' AS a31, '' AS a32,
        '' AS a33, '' AS a34, '' AS a35, '' AS a36, '' AS a37, '' AS a38, '' AS a39, '' AS a40
)
SELECT
    sum(n) OVER (PARTITION BY
        a1, a2, a3, a4
    )
FROM source;
+----------------------------------------------------------------------------------------------------------------------------------+
| sum(source.n) PARTITION BY [source.a1, source.a2, source.a3, source.a4] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
+----------------------------------------------------------------------------------------------------------------------------------+
| 1                                                                                                                                |
+----------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.673 seconds.

Partitioning over 5 columns

> WITH source AS (
    SELECT
        1 AS n,
        '' AS a1, '' AS a2, '' AS a3, '' AS a4, '' AS a5, '' AS a6, '' AS a7, '' AS a8,
        '' AS a9, '' AS a10, '' AS a11, '' AS a12, '' AS a13, '' AS a14, '' AS a15, '' AS a16,
        '' AS a17, '' AS a18, '' AS a19, '' AS a20, '' AS a21, '' AS a22, '' AS a23, '' AS a24,
        '' AS a25, '' AS a26, '' AS a27, '' AS a28, '' AS a29, '' AS a30, '' AS a31, '' AS a32,
        '' AS a33, '' AS a34, '' AS a35, '' AS a36, '' AS a37, '' AS a38, '' AS a39, '' AS a40
)
SELECT
    sum(n) OVER (PARTITION BY
        a1, a2, a3, a4, a5
    )
FROM source;
+---------------------------------------------------------------------------------------------------------------------------------------------+
| sum(source.n) PARTITION BY [source.a1, source.a2, source.a3, source.a4, source.a5] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
+---------------------------------------------------------------------------------------------------------------------------------------------+
| 1                                                                                                                                           |
+---------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 10.065 seconds.

Notice the time!

datafusion/physical-plan/src/windows/mod.rs

findepi · 2025-09-21T09:18:02Z

datafusion/physical-plan/src/windows/mod.rs

-        {
-            if window_eq_properties.ordering_satisfy(lex.clone())? {
-                all_satisfied_lexs.push(lex);
+        if !no_partitioning {


This condition looks redundant.

This condition still looks redundant. IIUC, when partitioning_exprs.is_empty() the code inside the THEN arm will not do anything

findepi · 2025-09-21T09:18:51Z

datafusion/physical-plan/src/windows/mod.rs

                        .into_iter()
-                        .map(sort_options_resolving_constant)
+                        .map(|expr| sort_options_resolving_constant(expr, false))
                        .multi_cartesian_product();


this looks exponential still. when is this code path taken?

Yes, this part is for calculating orderings of function results, and similar approach can also be applied here. If you think the partitioning expr update reasonable now, I'll apply the same here as well, in addition to the valid parts of your PR

let's maybe file issue / follow-up PR, so that we don't need to re-review code unnecessarily

findepi · 2025-09-21T09:24:22Z

datafusion/physical-plan/src/windows/mod.rs

+                // For each current partial ordering, try extending with each sort option
+                for current in current_orderings.iter() {
+                    for sort_expr in sort_options.iter() {
+                        let mut extended = current.clone();
+                        extended.push(sort_expr.clone());


Does this look potentially expensive?

In #17563 we have this example query

# regression test for https://github.com/apache/datafusion/issues/17401 query I WITH source AS ( SELECT 1 AS n, '' AS a1, '' AS a2, '' AS a3, '' AS a4, '' AS a5, '' AS a6, '' AS a7, '' AS a8, '' AS a9, '' AS a10, '' AS a11, '' AS a12, '' AS a13, '' AS a14, '' AS a15, '' AS a16, '' AS a17, '' AS a18, '' AS a19, '' AS a20, '' AS a21, '' AS a22, '' AS a23, '' AS a24, '' AS a25, '' AS a26, '' AS a27, '' AS a28, '' AS a29, '' AS a30, '' AS a31, '' AS a32, '' AS a33, '' AS a34, '' AS a35, '' AS a36, '' AS a37, '' AS a38, '' AS a39, '' AS a40 ) SELECT sum(n) OVER (PARTITION BY a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40 ) FROM source;

with current PR code it never completes, but let's ignore this for a moment.

if i read the code correctly, it "prunes early" when next partitioning expression doesn't provide more benefits. So i thought that it won't prune in some cases. Perhaps when input is pre-partitioned?

Let's add a test query

WITH source AS ( SELECT 1 AS n, '' AS a1, '' AS a2, '' AS a3, '' AS a4, '' AS a5, '' AS a6, '' AS a7, '' AS a8, '' AS a9, '' AS a10, '' AS a11, '' AS a12, '' AS a13, '' AS a14, '' AS a15, '' AS a16, '' AS a17, '' AS a18, '' AS a19, '' AS a20, '' AS a21, '' AS a22, '' AS a23, '' AS a24, '' AS a25, '' AS a26, '' AS a27, '' AS a28, '' AS a29, '' AS a30, '' AS a31, '' AS a32, '' AS a33, '' AS a34, '' AS a35, '' AS a36, '' AS a37, '' AS a38, '' AS a39, '' AS a40 ) SELECT sum(n) OVER (PARTITION BY a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40 ) FROM (SELECT * FROM source ORDER BY a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17, a18, a19, a20,; a21, a22, a23, a24, a25, a26, a27, a28, a29, a30, a31, a32, a33, a34, a35, a36, a37, a38, a39, a40);

However, I am not sure if i understand correctly the pruning condition and what's the condition for not pruning. Please elaborate.

alamb · 2025-09-21T12:28:32Z

This seems to actually have made planning time worse -- the benchmark run from yesterday is still running

Benchmarking physical_window_function_partition_by_7_on_values: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 30504.9s, or reduce sample count to 10.
physical_window_function_partition_by_7_on_values
                        time:   [303.87 s 304.07 s 304.28 s]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

Benchmarking physical_window_function_partition_by_8_on_values: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 696351.6s, or reduce sample count to 10.
Benchmarking physical_window_function_partition_by_8_on_values: Collecting 100 samples in estimated 696352 s (100 iterations)
...

(note I think 696351.6s means 8 days!)

.

berkaysynnada · 2025-09-21T18:33:18Z

DataFusion CLI v50.0.0
> WITH source AS (
    SELECT
        1 AS n,
        '' AS a1, '' AS a2, '' AS a3, '' AS a4, '' AS a5, '' AS a6, '' AS a7, '' AS a8,
        '' AS a9, '' AS a10, '' AS a11, '' AS a12, '' AS a13, '' AS a14, '' AS a15, '' AS a16,
        '' AS a17, '' AS a18, '' AS a19, '' AS a20, '' AS a21, '' AS a22, '' AS a23, '' AS a24,
        '' AS a25, '' AS a26, '' AS a27, '' AS a28, '' AS a29, '' AS a30, '' AS a31, '' AS a32,
        '' AS a33, '' AS a34, '' AS a35, '' AS a36, '' AS a37, '' AS a38, '' AS a39, '' AS a40
)
SELECT
    sum(n) OVER (PARTITION BY
        a1, a2, a3, a4
    )
FROM source;
+----------------------------------------------------------------------------------------------------------------------------------+
| sum(source.n) PARTITION BY [source.a1, source.a2, source.a3, source.a4] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
+----------------------------------------------------------------------------------------------------------------------------------+
| 1                                                                                                                                |
+----------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.039 seconds.

> WITH source AS (
    SELECT
        1 AS n,
        '' AS a1, '' AS a2, '' AS a3, '' AS a4, '' AS a5, '' AS a6, '' AS a7, '' AS a8,
        '' AS a9, '' AS a10, '' AS a11, '' AS a12, '' AS a13, '' AS a14, '' AS a15, '' AS a16,
        '' AS a17, '' AS a18, '' AS a19, '' AS a20, '' AS a21, '' AS a22, '' AS a23, '' AS a24,
        '' AS a25, '' AS a26, '' AS a27, '' AS a28, '' AS a29, '' AS a30, '' AS a31, '' AS a32,
        '' AS a33, '' AS a34, '' AS a35, '' AS a36, '' AS a37, '' AS a38, '' AS a39, '' AS a40
)
SELECT
    sum(n) OVER (PARTITION BY
        a1, a2, a3, a4, a5
    )
FROM source;
+---------------------------------------------------------------------------------------------------------------------------------------------+
| sum(source.n) PARTITION BY [source.a1, source.a2, source.a3, source.a4, source.a5] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
+---------------------------------------------------------------------------------------------------------------------------------------------+
| 1                                                                                                                                           |
+---------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.018 seconds.

I missed the case where there are no ordering constraints. In that scenario, ordering_satisfy() returns true for nearly all combinations, so we were still keeping O(4^n) orderings despite pruning. I've now updated the approach as greedy.

For each partition column, try 4 sort options and pick the first that works
Move to the next column only with the chosen option
Stop immediately if any column has no valid option

Can you check this updated version @findepi ?

apache#17684 (comment)

findepi · 2025-09-22T14:33:08Z

@findepi do you want me to take bench and slt changes from #17563 ?

The PR is currently light on tests. Can you please consider pulling synnada-ai#74 ?

findepi

the changed piece of code no longer looks like exhibiting exponential complexity

thank you!

please include tests (#17684 (comment))

findepi · 2025-09-22T14:34:52Z

datafusion/physical-plan/src/windows/mod.rs

-        {
-            if window_eq_properties.ordering_satisfy(lex.clone())? {
-                all_satisfied_lexs.push(lex);
+        if !no_partitioning {


This condition still looks redundant. IIUC, when partitioning_exprs.is_empty() the code inside the THEN arm will not do anything

findepi · 2025-09-22T14:46:39Z

datafusion/physical-plan/src/windows/mod.rs

+                    let mut candidate_ordering = ordering.clone();
+                    candidate_ordering.push(sort_expr.clone());


nit:

ordering.clone() is avoidable.
logically, it's an expression being pushed-tried-else-popped:

// Find a single valid ordering using a greedy approach let mut candidate_ordering = vec![]; for partition_expr in partitioning_exprs.iter() { let sort_options = sort_options_resolving_constant(Arc::clone(partition_expr), true); // Try each sort option and pick the first one that works let mut found = false; for sort_expr in sort_options.iter() { candidate_ordering.push(sort_expr.clone()); if let Some(lex) = LexOrdering::new(candidate_ordering.clone()) { if window_eq_properties.ordering_satisfy(lex)? { found = true; break; } } else { candidate_ordering.pop(); } } // If no sort option works for this column, we can't build a valid ordering if !found { candidate_ordering.clear(); break; } } // If we successfully built an ordering for all columns, use it if candidate_ordering.len() == partitioning_exprs.len() { if let Some(lex) = LexOrdering::new(ordering) { all_satisfied_lexs.push(lex); } }

findepi · 2025-09-22T14:47:23Z

datafusion/physical-plan/src/windows/mod.rs

+                    if let Some(lex) = LexOrdering::new(candidate_ordering.clone()) {
+                        if window_eq_properties.ordering_satisfy(lex)? {


With every new window sort expression iterated over, we seem to be processing all previous sort expressions. This feels quadratic.

Yes, it's quadratic as we validate orderings of length 1, 2, 3, n. But that's unavoidable since we need to check that each partial ordering stays valid as we extend it.

Still way better than before. Old version is O(4^n), the new one is O(n^2 * 4)

The quadratic cost is the price of correctness

No doubt quadratic is so much better than exponential.

What could make turn valid partial ordering into an invalid, when extended with a new expression?
Is it only about duplicates?
IIRC, LexOrdering::new checks for dups, so something we could easily do O(n) overall. But I don't know yet what window_eq_properties.ordering_satisfy does.

findepi · 2025-09-22T14:48:13Z

datafusion/physical-plan/src/windows/mod.rs

                        .into_iter()
-                        .map(sort_options_resolving_constant)
+                        .map(|expr| sort_options_resolving_constant(expr, false))
                        .multi_cartesian_product();


let's maybe file issue / follow-up PR, so that we don't need to re-review code unnecessarily

alamb · 2025-09-22T14:51:17Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix/exponential-window-ordering-calc (43a323f) to 03b6789 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=physical_window_function_partition
BENCH_BRANCH_NAME=fix_exponential-window-ordering-calc
Results will be posted here when complete

alamb · 2025-09-22T15:19:06Z

🤖: Benchmark completed

Details

group                                                fix_exponential-window-ordering-calc    main
-----                                                ------------------------------------    ----
physical_window_function_partition_by_4_on_values    1.00    606.4±5.29µs        ? ?/sec     2.19   1327.3±5.88µs        ? ?/sec
physical_window_function_partition_by_7_on_values    1.00    736.8±4.67µs        ? ?/sec     48.32    35.6±0.16ms        ? ?/sec
physical_window_function_partition_by_8_on_values    1.00    797.4±6.45µs        ? ?/sec     173.51   138.4±0.54ms        ? ?/sec

alamb · 2025-09-22T17:56:18Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix/exponential-window-ordering-calc (43a323f) to 03b6789 diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=physical_window_function_partition
BENCH_BRANCH_NAME=fix_exponential-window-ordering-calc
Results will be posted here when complete

alamb · 2025-09-22T17:57:04Z

physical_window_function_partition_by_4_on_values    1.00    606.4±5.29µs        ? ?/sec     2.19   1327.3±5.88µs        ? ?/sec
physical_window_function_partition_by_7_on_values    1.00    736.8±4.67µs        ? ?/sec     48.32    35.6±0.16ms        ? ?/sec
physical_window_function_partition_by_8_on_values    1.00    797.4±6.45µs        ? ?/sec     173.51   138.4±0.54ms        ? ?/sec

🚀 -- it is not often we see a 173x speedup 🐈 🥳

berkaysynnada · 2025-09-22T18:15:06Z

the changed piece of code no longer looks like exhibiting exponential complexity

thank you!

please include tests (#17684 (comment))

I'll drive this to the finish line, probably tomorrow morning (all the stuff we've talked about). Thank you :)

alamb · 2025-09-22T18:20:48Z

🤖: Benchmark completed

Details

group                                                fix_exponential-window-ordering-calc    main
-----                                                ------------------------------------    ----
physical_window_function_partition_by_4_on_values    1.00    598.6±4.75µs        ? ?/sec     2.21  1323.9±14.77µs        ? ?/sec
physical_window_function_partition_by_7_on_values    1.00    751.3±5.72µs        ? ?/sec     47.12    35.4±0.15ms        ? ?/sec
physical_window_function_partition_by_8_on_values    1.00    798.7±3.28µs        ? ?/sec     173.17   138.3±0.48ms        ? ?/sec

findepi · 2025-09-22T19:13:46Z

🤖: Benchmark completed

thanks @alamb . these lgtm

…dering-calc tests copied from v1 pr

berkaysynnada · 2025-09-23T14:58:51Z

I think it's ready. @findepi feel free to commit directly if you'd like to make any changes

DataFusion CLI v50.0.0
> WITH source AS (
    SELECT
        1 AS n,
        '' AS a1, '' AS a2, '' AS a3, '' AS a4, '' AS a5, '' AS a6, '' AS a7, '' AS a8,
        '' AS a9, '' AS a10, '' AS a11, '' AS a12, '' AS a13, '' AS a14, '' AS a15, '' AS a16,
        '' AS a17, '' AS a18, '' AS a19, '' AS a20, '' AS a21, '' AS a22, '' AS a23, '' AS a24,
        '' AS a25, '' AS a26, '' AS a27, '' AS a28, '' AS a29, '' AS a30, '' AS a31, '' AS a32,
        '' AS a33, '' AS a34, '' AS a35, '' AS a36, '' AS a37, '' AS a38, '' AS a39, '' AS a40
)
SELECT
    sum(n) OVER (PARTITION BY
        a1, a2, a3, a4, a5
    )
FROM source;
+---------------------------------------------------------------------------------------------------------------------------------------------+
| sum(source.n) PARTITION BY [source.a1, source.a2, source.a3, source.a4, source.a5] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING |
+---------------------------------------------------------------------------------------------------------------------------------------------+
| 1                                                                                                                                           |
+---------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.039 seconds.

alamb

Thank you @berkaysynnada -- I went through this PR carefully and I think it is a nice improvement over what is on main. Thank you for working on this and thank you to @findepi and @comphead for the reviews

datafusion/physical-plan/src/windows/mod.rs

alamb · 2025-09-23T16:45:39Z

datafusion/physical-plan/src/windows/mod.rs

+                            sort_options_resolving_constant(Arc::clone(expr), false);
+
+                        // Try each option and pick the first that works
+                        for sort_expr in sort_options.iter() {


Similarly here you could potentially use sort_options.into_iter() and save the clone below. Here is what I used

@@ -505,13 +505,14 @@ pub(crate) fn window_equivalence_properties( sort_options_resolving_constant(Arc::clone(expr), false); // Try each option and pick the first that works - for sort_expr in sort_options.iter() { - candidate_order.push(sort_expr.clone()); + for sort_expr in sort_options.into_iter() { + let is_asc = !sort_expr.options.descending; + candidate_order.push(sort_expr); if let Some(lex) = LexOrdering::new(candidate_order.clone()) { if window_eq_properties.ordering_satisfy(lex)? { if idx == 0 { - asc = !sort_expr.options.descending; + asc = is_asc; } found = true; break;

alamb · 2025-09-23T16:46:18Z

datafusion/physical-plan/src/windows/mod.rs

-                            asc = !f.options.descending;
+                    // Find one valid ordering for aggregate arguments instead of
+                    // checking all combinations
+                    let aggregate_exprs = sliding_expr.get_aggregate_expr().expressions();


FWIW this seems very similar to the loop above, I wonder if there is some way (as a follow on PR) to factor it out to reduce the replication

alamb · 2025-09-23T16:48:49Z

datafusion/sqllogictest/test_files/window.slt

 0 3 NULL NULL 0 NULL NULL
 0 4 NULL NULL 0 NULL NULL
+
+# regression test for https://github.com/apache/datafusion/issues/17401


I ran this locally, and found

both main and This branch -- took 2 seconds

Main

(venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ cargo test --profile=ci --test sqllogictests -- window.slt Finished `ci` profile [unoptimized + debuginfo] target(s) in 0.20s Running bin/sqllogictests.rs (target/ci/deps/sqllogictests-4dcb99f83e94c047) Completed 2 test files in 2 seconds

This branch

(venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ cargo test --profile=ci --test sqllogictests -- window.slt Finished `ci` profile [unoptimized + debuginfo] target(s) in 0.19s Running bin/sqllogictests.rs (target/ci/deps/sqllogictests-fbba93e4275b7826) Completed 2 test files in 2 seconds

I ran this locally, and found

both main and This branch -- took 2 seconds

Main

Do you mean you run window.slt unmodified as currently in main, or did you apply changes from this PR first?

I tried to reproduce the latter and for me execution "hangs" at

[00:00:01] #######################################- 442/454 "window.slt"

findepi · 2025-09-23T17:02:54Z

datafusion/physical-plan/src/windows/mod.rs

+        // If we successfully built an ordering for all columns, use it
+        // When there are no partition expressions, candidate_ordering will be empty and won't be added
+        if candidate_ordering.len() == partitioning_exprs.len()
+            && !candidate_ordering.is_empty()


&& !candidate_ordering.is_empty() is redundant.
in the empty case, the LexOrdering::new returns None

Suggested change

&& !candidate_ordering.is_empty()

findepi · 2025-09-23T17:05:08Z

datafusion/physical-plan/src/windows/mod.rs

+                            if let Some(lex) = LexOrdering::new(candidate_order.clone()) {
+                                if window_eq_properties.ordering_satisfy(lex)? {
+                                    if idx == 0 {
+                                        asc = !sort_expr.options.descending;


why first is special? worth a code comment

(i know the logic is pre-existing and did not have a comment, but you seem to know what this is about and I do not)

findepi · 2025-09-23T17:06:16Z

datafusion/physical-plan/src/windows/mod.rs

-                        .expressions()
-                        .into_iter()
-                        .map(sort_options_resolving_constant)
-                        .multi_cartesian_product();


Thanks for solving this exponential case too!

I am not sure the regression tests cover this case.
Do we need some more test queries?

datafusion/datafusion/core/tests/physical_optimizer/enforce_sorting.rs

Line 2470 in 6ec14e9

async fn test_window_partial_constant_and_set_monotonicity() -> Result<()> {

highly covers the correctness issue, but for the performance concerns, a similar regression test should be added, like this:

SUM(c1 + c2 + c3 + c4 + c5 + c6 + c7 + c8 + c9 + c10 + ... + cN) OVER (ORDER BY c1, c2, ... cN and a causal window)

datafusion/physical-plan/src/windows/mod.rs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

….com/synnada-ai/datafusion-upstream into fix/exponential-window-ordering-calc

alamb · 2025-09-25T14:36:31Z

Are there any remaining issues preventing merging this PR?

alamb · 2025-09-25T18:34:21Z

Ok, this looks good to me, so I'll merge it in now. Thank you @berkaysynnada and @findepi

…7684) * fix * Update mod.rs * Update mod.rs * Update mod.rs * tests copied from v1 pr * test case from review comment apache#17684 (comment) * one more test case * Update mod.rs * Update datafusion/physical-plan/src/windows/mod.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Update datafusion/physical-plan/src/windows/mod.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Update mod.rs * Update mod.rs --------- Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

alamb · 2025-09-25T18:42:59Z

Backport PR

[branch-50] Backport Prevent exponential planning time for Window functions - v2 #17684 #17778

…17778) * fix * Update mod.rs * Update mod.rs * Update mod.rs * tests copied from v1 pr * test case from review comment #17684 (comment) * one more test case * Update mod.rs * Update datafusion/physical-plan/src/windows/mod.rs * Update datafusion/physical-plan/src/windows/mod.rs * Update mod.rs * Update mod.rs --------- Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com> Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>

* Use `Display` formatting of `DataType`:s in error messages (#17565) * Use Display formatting for DataTypes where I could find them * fix * More places * Less Debug * Cargo fmt * More cleanup * Plural types as Display * Fixes * Update some more tests and error messages * Update test snapshot * last (?) fixes * update another slt * Update instructions on how to run the tests * Ignore pending snapshot files in .gitignore * Running all the tests is so slow * just a trailing space * Update another test * Fix markdown formatting * Improve Display for NativeType * Update code related to error reporting of NativeType * Revert some formatting * fixelyfix * Another snapshot update * docs: Move Google Summer of Code 2025 pages to a section (#17504) * Move GSOC content to its own section * Update to 20205 * feat: Add `OR REPLACE` to creating external tables (#17580) * feat: Add `OR REPLACE` to creating external tables * regen * fmt * make more explicit + add tests * clipy fix --------- Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me> * `avg(distinct)` support for decimal types (#17560) * chore: mv `DistinctSumAccumulator` to common * feat: add avg distinct support for float64 type * chore: fmt * refactor: update import for DataType in Float64DistinctAvgAccumulator and remove unused sum_distinct module * feat: add avg distinct support for float64 type * feat: add avg distinct support for decimal * feat: more test for avg distinct in rust api * Remove DataFrame API tests for avg(distinct) * Remove proto test * Fix merge errors * Refactoring * Minor cleanup * Decimal slt tests for avg(distinct) * Fix state_fields for decimal distinct avg --------- Co-authored-by: YuNing Chen <admin@ynchen.me> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me> * chore(deps): bump taiki-e/install-action from 2.61.8 to 2.61.9 (#17640) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.8 to 2.61.9. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/2fdc5fd6ac805b0f8256893bd4c807bcb666af00...8ea32481661d5e04d602f215b94f17e4014b44f9) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.61.9 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump Swatinem/rust-cache from 2.8.0 to 2.8.1 (#17641) Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.8.0 to 2.8.1. - [Release notes](https://github.com/swatinem/rust-cache/releases) - [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md) - [Commits](https://github.com/swatinem/rust-cache/compare/98c8021b550208e191a6a3145459bfc9fb29c4c0...f13886b937689c021905a6b90929199931d60db1) --- updated-dependencies: - dependency-name: Swatinem/rust-cache dependency-version: 2.8.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Validate the memory consumption in SPM created by multi level merge (#17029) * use GreedyMemoryPool for sanity check * validate whether batch read from spill exceeds max_record_batch_mem * replace err with warn log * fix(SubqueryAlias): use maybe_project_redundant_column (#17478) * fix(SubqueryAlias): use maybe_project_redundant_column Fixes #17405 * chore: format * ci: retry * chore(SubqueryAlias): restructore duplicate detection and add tests * docs: add examples and context to the reproducer * minor: Ensure `datafusion-sql` package dependencies have `sql` flag (#17644) * optimizer: Rewrite `IS NOT DISTINCT FROM` joins as Hash Joins (#17319) * optimizer: Convert to Hash Join for join predicates like 'a IS NOT DISTINCT FROM b' * drop tables in slt * fix rust doc * Update datafusion/optimizer/src/extract_equijoin_predicate.rs Co-authored-by: Jonathan Chen <chenleejonathan@gmail.com> * Update datafusion/optimizer/src/extract_equijoin_predicate.rs * Update datafusion/sqllogictest/test_files/join_is_not_distinct_from.slt Co-authored-by: Nga Tran <nga-tran@live.com> * review: more tests and better error message * review: improve doc --------- Co-authored-by: Jonathan Chen <chenleejonathan@gmail.com> Co-authored-by: Nga Tran <nga-tran@live.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Upgrade to arrow 56.1.0 (#17275) * Update to arrow/parquet 56.1.0 * Adjust for new parquet sizes, update for deprecated API * Thread through max_predicate_cache_size, add test * fix: Preserves field metadata when creating logical plan for VALUES expression (#17525) * [ISSUE 17425] Initial attempt to fix this problem * Add tests for the fix * Require that the metadata of values in VALUES clause must be identical * fix merge error --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore(deps): bump serde from 1.0.223 to 1.0.225 (#17614) Bumps [serde](https://github.com/serde-rs/serde) from 1.0.223 to 1.0.225. - [Release notes](https://github.com/serde-rs/serde/releases) - [Commits](https://github.com/serde-rs/serde/compare/v1.0.223...v1.0.225) --- updated-dependencies: - dependency-name: serde dependency-version: 1.0.225 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me> * chore: Update dynamic filter formatting (#17647) * chore: update dynamic filter formatting to indicate expr is placeholder * update tests * update tests * chore(deps): bump taiki-e/install-action from 2.61.9 to 2.61.10 (#17660) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.9 to 2.61.10. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/8ea32481661d5e04d602f215b94f17e4014b44f9...0aa4f22591557b744fe31e55dbfcdfea74a073f7) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.61.10 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * proto: don't include parquet feature by default (#17577) * feat: add support for RightAnti and RightSemi join types (#17604) Closes #17603 * minor: Ensure `proto` crate has datetime & unicode expr flags in datafusion dev dependency (#17656) * minor: Ensure `proto` crate has datetime & unicode expr flags in datafusion dev dependency * toml formatting * chore(deps): bump indexmap from 2.11.3 to 2.11.4 (#17661) Bumps [indexmap](https://github.com/indexmap-rs/indexmap) from 2.11.3 to 2.11.4. - [Changelog](https://github.com/indexmap-rs/indexmap/blob/main/RELEASES.md) - [Commits](https://github.com/indexmap-rs/indexmap/compare/2.11.3...2.11.4) --- updated-dependencies: - dependency-name: indexmap dependency-version: 2.11.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs: add xorq to list of known users (#17668) * Introduce `TypeSignatureClass::Binary` to allow accepting arbitrarily sized `FixedSizeBinary` arguments (#17531) * Introduce wildcard const for FixedSizeBinary type signature * Add Binary to TypeSignatureClass * Remove FIXED_SIZE_BINARY_WILDCARD * docs: deduplicate links in `introduction.md` (#17669) * docs: deduplicate links in `introduction.md` * Further simplifications * Fix * Add explicit PMC/committers list to governance docs page (#17574) * Add committers explicitly to governance page, with script * add license header * Update Wes McKinney's affiliation in governance.md * Update adriangb's affiliation * Update affiliation * Andy Grove Affiliation * Update Qi Zhu affiliation * Updatd linwei's info * Update docs/source/contributor-guide/governance.md * Update docs/source/contributor-guide/governance.md * Apply suggestions from code review Co-authored-by: Oleks V <comphead@users.noreply.github.com> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> * Apply suggestions from code review Co-authored-by: Alex Huang <huangweijun1001@gmail.com> Co-authored-by: Yang Jiang <jiangyang381@163.com> Co-authored-by: Yongting You <2010youy01@gmail.com> * Apply suggestions from code review Co-authored-by: Yijie Shen <henry.yijieshen@gmail.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Brent Gardner <bgardner@squarelabs.net> Co-authored-by: Dmitrii Blaginin <github@blaginin.me> Co-authored-by: Jax Liu <liugs963@gmail.com> Co-authored-by: Ifeanyi Ubah <ify1992@yahoo.com> * Apply suggestions from code review Co-authored-by: Will Jones <willjones127@gmail.com> * Clarify what is updated in the script * Apply suggestions from code review Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com> Co-authored-by: Dan Harris <1327726+thinkharderdev@users.noreply.github.com> * Update docs/source/contributor-guide/governance.md * Update docs/source/contributor-guide/governance.md Co-authored-by: Parth Chandra <parthc@apache.org> * Update docs/source/contributor-guide/governance.md * prettier --------- Co-authored-by: Wes McKinney <wesm@apache.org> Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com> Co-authored-by: Mustafa Akur <akurmustafa@gmail.com> Co-authored-by: Qi Zhu <821684824@qq.com> Co-authored-by: 张林伟 <lewiszlw520@gmail.com> Co-authored-by: xudong.w <wxd963996380@gmail.com> Co-authored-by: Oleks V <comphead@users.noreply.github.com> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Alex Huang <huangweijun1001@gmail.com> Co-authored-by: Yang Jiang <jiangyang381@163.com> Co-authored-by: Yongting You <2010youy01@gmail.com> Co-authored-by: Yijie Shen <henry.yijieshen@gmail.com> Co-authored-by: Brent Gardner <bgardner@squarelabs.net> Co-authored-by: Dmitrii Blaginin <github@blaginin.me> Co-authored-by: Jax Liu <liugs963@gmail.com> Co-authored-by: Ifeanyi Ubah <ify1992@yahoo.com> Co-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Paddy Horan <5733408+paddyhoran@users.noreply.github.com> Co-authored-by: Dan Harris <1327726+thinkharderdev@users.noreply.github.com> Co-authored-by: Ruihang Xia <waynestxia@gmail.com> Co-authored-by: Parth Chandra <parthc@apache.org> * fix: Ignore governance doc from typos (#17678) * Support Decimal32/64 types (#17501) * Support Decimal32/64 types * Fix bugs, tests, handle more aggregate functions and schema * Fill out more parts in expr,common and expr-common * Some stragglers and overlooked corners * Actually commit the avg_distinct support --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * minor: Improve hygiene for `datafusion-functions` macros (#17638) * feat(small): Display `NullEquality` in join executor's `EXPLAIN` output (#17664) * Clarify null-equal explain expectations * Format null equality display strings * fix test * review: more concise message * review: more concise message * Custom timestamp format for DuckDB (#17653) * feat(substrait): add time literal support (#17655) Adds support for `ScalarValue::Time64Microsecond` and `ScalarValue::Time64Nanosecond` to be converted to and from Substrait literals. This includes the `PrecisionTime` literal type and specific `TIME_64_TYPE_VARIATION_REF` for 6-digit (microseconds) and 9-digit (nanoseconds) precision. Co-authored-by: Bruno Volpato <bruno.volpato@datadoghq.com> * Support LargeList for array_sort (#17657) * Support FixedSizeList for array_except (#17658) * fix: null padding for `array_reverse` on `FixedSizeList` (#17673) * fix: array_reverse with null * update * update * chore: refactor array fn signatures & add more slt tests (#17672) * Support FixedSizeList for array_to_string (#17666) * fix: correct statistics for `NestedLoopJoinExec` (#17680) * fix: correct statistics for nestedloopexec * chore: update comment * minor: add SQLancer fuzzed SLT case for natural joins (#17683) * chore: Upgrade Rust version to 1.90.0 (#17677) * chore: bump workspace rust version to 1.90.0 * fix clippy errors * fix clippy errors * try using dedicate runner temp space * retrigger * inspect disk usage * split build/run * disable debug info in ci profile * revert ci changes * Support FixedSizeList for array_position (#17659) * chore(deps): bump the proto group with 2 updates (#16806) * chore(deps): bump the proto group with 2 updates Bumps the proto group with 2 updates: [pbjson-build](https://github.com/influxdata/pbjson) and [prost-build](https://github.com/tokio-rs/prost). Updates `pbjson-build` from 0.7.0 to 0.8.0 - [Commits](https://github.com/influxdata/pbjson/commits) Updates `prost-build` from 0.13.5 to 0.14.1 - [Release notes](https://github.com/tokio-rs/prost/releases) - [Changelog](https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/prost/compare/v0.13.5...v0.14.1) --- updated-dependencies: - dependency-name: pbjson-build dependency-version: 0.8.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: proto - dependency-name: prost-build dependency-version: 0.14.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: proto ... Signed-off-by: dependabot[bot] <support@github.com> * Regen protos --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jefffrey <jeffrey.vo.australia@gmail.com> * feat(spark): implement Spark `make_interval` function (#17424) * feat(spark): implement Spark make_interval function * fix name length * add doc * add doc and change test, need more test * fmt * add test and doc, need to work in overflow * clippy * empty params * test ok IntervalMonthDayNano::new(0, 0, 0) in unit test * line blank * fix doc table select * dont panic * update test and not panic fmt * review * review fix test failure * review fix test failure format simple string * test uncomment and link * return test (empty) * changes review * all overflow null * all overflow null fix fmt * changes review * changes review clippy * refactor move * fix error doc date_sub * clean slt * no space device * chore: Update READMEs of crates to be more consistent (#17691) * chore: Update READMEs of crates to be more consistent * Add some more Apache project links * Minor formatting * Formatting * Update datafusion/pruning/README.md Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * suggestion * formatting * formatting --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: update a bunch of dependencies (#17708) * chore: fix wasm-pack installation link in wasmtest README (#17704) * Support FixedSizeList for array_slice via coercion to List (#17667) * docs: Remove disclaimer that `datafusion` 50.0.0 is not released (#17695) * docs: Remove disclaimer that datafusion 50.0.0 is not released * Add section about 51.0.0 * chore(deps): bump taiki-e/install-action from 2.61.10 to 2.62.1 (#17710) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.10 to 2.62.1. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/0aa4f22591557b744fe31e55dbfcdfea74a073f7...d6912b47771be2c443ec90dbb3d28e023987e782) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * perf: Improve the performance of WINDOW functions with many partitions (#17528) * perf: Improve the performance of WINDOW functions with many partitions * Improve variable name in calculate_n_out_row * fix: Partial AggregateMode will generate duplicate field names which will fail DFSchema construct (#17706) * fix: Partial AggregateMode will generate duplicate field names which will fail DFSchema construct * Update datafusion/common/src/dfschema.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * fmt --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * feat: expose `udafs` and `udwfs` methods on `FunctionRegistry` (#17650) * expose udafs and udwfs method on `FunctionRegistry` * fix doc test * add default implementations not to trigger backward incompatible change for others * Support remaining substrait time literal variations (#17707) * Bump MSRV to 1.87.0 (#17724) * Bump MSRV to 1.87.0 * automatic code fixes * Add upgrading entry * Avoid redundant Schema clones (#17643) * Collocate variants of From DFSchema to Schema * Remove duplicated logic for obtaining Schema from DFSchema * Remove Arc clone in hash_nested_array * Avoid redundant Schema clones * Avoid some Field clones * make arc clones explicit * retract the new From * empty: roll the dice 🎲 * Use github link instead of relative link to optimizer_rule.rs in query-optimizer.md (#17723) * Move misplaced upgrading entry about MSRV (#17727) * Introduce `avg_distinct()` and `sum_distinct()` functions to DataFrame API (#17536) * Introduce `avg_distinct()` and `sum_distinct()` functions to DataFrame API * Add to roundtrip proto tests * Support `WHERE`, `ORDER BY`, `LIMIT`, `SELECT`, `EXTEND` pipe operators (#17278) * support WHERE pipe operator * support order by * support limit * select pipe * extend support * document supported pipe operators in user guide * fmt * fix where pipe before extend * don't rebind * remove clone * move docs into select.md * avoid confusion by removing `>` in examples --------- Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * doc: add missing examples for multiple math functions (#17018) * Update Scalar_functions.md * pretier fix * Updated files * Updated Scalar functions * Update datafusion/functions/src/math/log.rs Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * Update datafusion/functions/src/math/monotonicity.rs Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * Update datafusion/functions/src/math/monotonicity.rs Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * Update datafusion/functions/src/math/nans.rs Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * Update datafusion/functions/src/math/nanvl.rs Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * Fix tanh example to be tanh not trunc * Run update_function_docs.sh --------- Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * feat: support for null, date, and timestamp types in approx_distinct (#17618) * feat: let approx_distinct handle null, date and timestamp types Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * chore: update testing submodule Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * feat: supports time type and refactor NullHLLAccumulator Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * bump arrow-testing submodule --------- Signed-off-by: Dennis Zhuang <killme2008@gmail.com> Co-authored-by: Jefffrey <jeffrey.vo.australia@gmail.com> * fix(agg/corr): return NULL when variance is zero or samples < 2 (#17621) Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * chore(deps): bump taiki-e/install-action from 2.62.1 to 2.62.4 (#17739) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.1 to 2.62.4. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/d6912b47771be2c443ec90dbb3d28e023987e782...5597bc27da443ba8bf9a3bc4e5459ea59177de42) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump tempfile from 3.22.0 to 3.23.0 (#17741) Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.22.0 to 3.23.0. - [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md) - [Commits](https://github.com/Stebalien/tempfile/compare/v3.22.0...v3.23.0) --- updated-dependencies: - dependency-name: tempfile dependency-version: 3.23.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: make `LimitPushPastWindows` public (#17736) * fix: Remove parquet encryption feature from root deps (#17700) This fix relates to issue #16650 by completing #16649 . * fix: Remove datafusion-macros's dependency on datafusion-expr (#17688) * Remove datafusion-macros's dependency on datafusion-expr * Re-export * chore: remove homebrew publish instructions from release steps (#17735) * minor: create `OptimizerContext` with provided `ConfigOptions` (#17742) * Improve documentation for ordered set aggregate functions (#17744) * docs: fix sidebar overlapping table on configuration page on website (#17738) * solved bug * fix:modified css for table overlapping * Add support for calling async UDF as aggregation expression (#17620) * Add support for calling async UDF as aggregation expression Fixes https://github.com/apache/datafusion/issues/17619 * add explain plans * chore(deps): bump taiki-e/install-action from 2.62.4 to 2.62.5 (#17750) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.4 to 2.62.5. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/5597bc27da443ba8bf9a3bc4e5459ea59177de42...6f69ec9970ed0c500b1b76d648e05c4c7e0e5671) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * (fix): Lag function creates unwanted projection (#17630) (#17639) * fix: Not adding generated windown expr resulting column twice (#17630) * Making clippy happier * Support `LargeList` in `array_has` simplification to `InList` (#17732) * Support `LargeList` in `array_has` simplification to `InList` * refactoring * chore(deps): bump wasm-bindgen-test from 0.3.51 to 0.3.53 (#17642) * chore(deps): bump wasm-bindgen-test from 0.3.51 to 0.3.53 Bumps [wasm-bindgen-test](https://github.com/wasm-bindgen/wasm-bindgen) from 0.3.51 to 0.3.53. - [Release notes](https://github.com/wasm-bindgen/wasm-bindgen/releases) - [Changelog](https://github.com/wasm-bindgen/wasm-bindgen/blob/main/CHANGELOG.md) - [Commits](https://github.com/wasm-bindgen/wasm-bindgen/commits) --- updated-dependencies: - dependency-name: wasm-bindgen-test dependency-version: 0.3.53 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * testing setting WASM_BINDGEN_TEST_TIMEOUT * more testing * more testing * more testing * more testing * more testing * testing * testing * testing * testing * whoops * whoops * testing * testing * testing * testing * testing * testing * testing * testing * testing * testing * testing * testing * problem commit * please let this work * oops * test 0.3.53 * fix --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com> * feat: support `Utf8View` for more args of `regexp_replace` (#17195) * Stash changes. * Signature cleanup, more test scenarios. * Minor test renaming. * Simplify signature. * Update tests. * Signature change for binary input support. * Return type changes for binary. * Stash. * Stash. * Stash. * Stash. * Fix regx bench. * Clippy. * Fix bench regx. * Refactor signature. I need to remove the match arms that aren't used anymore, update the .slt test for string_view.slt, and understand why String(3) and String(4) is not equivalent to this. * Remove unnecessary match arms. * Update string_view slt test. * Reduce diff by returning to single function with a match arm instead of two. * Simplify template args. * Fix benchmark compilation. * Address PR feedback. * feat(spark): implement Spark `map` function `map_from_arrays` (#17456) * feat(spark): implement Spark `map` function `map_from_arrays` * chore: add test with nested `map_from_arrays` calls, refactor map_deduplicate_keys to remove unnesessary variables and array slices * fix: clippy warning * fix: null and different size input lists treatment, chore: move common map funcs to utils.rs, add more tests * fix: typo * fix: clippy docstring warning * chore: move more helpers needed for multiple map functions to utils * chore: add multi-row tests * fix: null values treatment * fix: docstring warnings * chore(deps): bump object_store from 0.12.3 to 0.12.4 (#17753) Bumps [object_store](https://github.com/apache/arrow-rs-object-store) from 0.12.3 to 0.12.4. - [Changelog](https://github.com/apache/arrow-rs-object-store/blob/main/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs-object-store/compare/v0.12.3...v0.12.4) --- updated-dependencies: - dependency-name: object_store dependency-version: 0.12.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update `arrow` / `parquet` to 56.2.0 (#17631) * temp update to arrow 56.2.0 pin * Update to 56.2.0 * Use released arrow * Update cargo.lock * fix lock * chore(deps): bump taiki-e/install-action from 2.62.5 to 2.62.6 (#17766) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.5 to 2.62.6. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/6f69ec9970ed0c500b1b76d648e05c4c7e0e5671...4575ae687efd0e2c78240087f26013fb2484987f) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.6 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Keep aggregate udaf schema names unique when missing an order-by (#17731) * test: reproducer of bug * fix: make schema names unique for approx_percentile_cont * test: regression test is now resolved * feat : Display function alias in output column name (#17690) * display function's alias name in output column * Update function.rs * updated verbose name format * simplify alias logic and removing args clone * Support join cardinality estimation less conservatively (#17476) * Support join cardinality estimation if distinct_count is set Currently we require max and min to be set, as they might be used to estimate the distinct count. This is unnecessarily conservative if distinct_count has actually been provided, in which case max and min won't be used at all and the presence of max or min has no influence over how good of an estimate it is. * Update datafusion/physical-plan/src/joins/utils.rs Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> * Update tests * Calculate cardinality even if distinct or min/max not provided --------- Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> * chore(deps): bump libc from 0.2.175 to 0.2.176 (#17767) Bumps [libc](https://github.com/rust-lang/libc) from 0.2.175 to 0.2.176. - [Release notes](https://github.com/rust-lang/libc/releases) - [Changelog](https://github.com/rust-lang/libc/blob/0.2.176/CHANGELOG.md) - [Commits](https://github.com/rust-lang/libc/compare/0.2.175...0.2.176) --- updated-dependencies: - dependency-name: libc dependency-version: 0.2.176 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump postgres-types from 0.2.9 to 0.2.10 (#17768) Bumps [postgres-types](https://github.com/rust-postgres/rust-postgres) from 0.2.9 to 0.2.10. - [Release notes](https://github.com/rust-postgres/rust-postgres/releases) - [Commits](https://github.com/rust-postgres/rust-postgres/compare/postgres-types-v0.2.9...postgres-types-v0.2.10) --- updated-dependencies: - dependency-name: postgres-types dependency-version: 0.2.10 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Use `Expr::qualified_name()` and `Column::new()` to extract partition keys from window and aggregate operators (#17757) * Use `Expr::qualified_name()` and `Column::new()` to extract partition keys Using `Expr::schema_name()` and `Column::from_qualified_name()` could incorrectly parse the column name. * Use `Expr::qualified_name()` to extract group by keys * Retrain dataframe tests with filters and aggregates * Prevent exponential planning time for Window functions - v2 (#17684) * fix * Update mod.rs * Update mod.rs * Update mod.rs * tests copied from v1 pr * test case from review comment https://github.com/apache/datafusion/pull/17684#discussion_r2366146307 * one more test case * Update mod.rs * Update datafusion/physical-plan/src/windows/mod.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Update datafusion/physical-plan/src/windows/mod.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Update mod.rs * Update mod.rs --------- Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: add Ballista link to landing page (#17746) (#17775) * docs: add Ballista link to landing page (#17746) This adds a link and description for DataFusion Ballista to the landing page, as suggested in issue #17746. Ballista is a distributed compute platform built on top of DataFusion. Closes: #17746 * fix(docs): update Ballista link * updated theory part * chore(deps): bump taiki-e/install-action from 2.62.6 to 2.62.8 (#17781) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.6 to 2.62.8. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/4575ae687efd0e2c78240087f26013fb2484987f...ea0eda622640ac23a17ba349cf09e2709d58f5e1) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.8 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump wasm-bindgen-test from 0.3.53 to 0.3.54 (#17784) Bumps [wasm-bindgen-test](https://github.com/wasm-bindgen/wasm-bindgen) from 0.3.53 to 0.3.54. - [Release notes](https://github.com/wasm-bindgen/wasm-bindgen/releases) - [Changelog](https://github.com/wasm-bindgen/wasm-bindgen/blob/main/CHANGELOG.md) - [Commits](https://github.com/wasm-bindgen/wasm-bindgen/commits) --- updated-dependencies: - dependency-name: wasm-bindgen-test dependency-version: 0.3.54 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: Action some old TODOs in github actions (#17694) * chore: Action some old TODOs in github actions * Update Cargo.toml * testing * Revert changing cli test runner to use container * Remove sccache * dev: Add benchmark for compilation profiles (#17754) * Add benchmark for compilation profiles * add apache header * add apache header * chore(deps): bump tokio-postgres from 0.7.13 to 0.7.14 (#17785) Bumps [tokio-postgres](https://github.com/rust-postgres/rust-postgres) from 0.7.13 to 0.7.14. - [Release notes](https://github.com/rust-postgres/rust-postgres/releases) - [Commits](https://github.com/rust-postgres/rust-postgres/compare/tokio-postgres-v0.7.13...tokio-postgres-v0.7.14) --- updated-dependencies: - dependency-name: tokio-postgres dependency-version: 0.7.14 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump serde from 1.0.226 to 1.0.227 (#17783) Bumps [serde](https://github.com/serde-rs/serde) from 1.0.226 to 1.0.227. - [Release notes](https://github.com/serde-rs/serde/releases) - [Commits](https://github.com/serde-rs/serde/compare/v1.0.226...v1.0.227) --- updated-dependencies: - dependency-name: serde dependency-version: 1.0.227 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump regex from 1.11.2 to 1.11.3 (#17782) Bumps [regex](https://github.com/rust-lang/regex) from 1.11.2 to 1.11.3. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/regex/compare/1.11.2...1.11.3) --- updated-dependencies: - dependency-name: regex dependency-version: 1.11.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Support `CAST` from temporal to `Utf8View` (#17535) * Add case expr simplifiers for literal comparisons (#17743) * Add case expr simplifiers for literal comparisons * Update datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * Avoid expr clones --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: dependabot to run weekly (#17797) * [DOCS] Add dbt Fusion engine and R2 Query Engine to "Known Users" (#17793) * Add dbt Fusion engine and R2 Query Engine * Update docs/source/user-guide/introduction.md * Update docs/source/user-guide/introduction.md * feat: change `datafusion-proto` to use `TaskContext` rather than`SessionContext` for physical plan serialization (#17601) * change session context to task context in physical proto ... * fix compilation issue * remove `RuntimeEnv` from few function arguments * update upgrading guide * display window function's alias name in output (#17788) * docs: update wasmtest README with instructions for Apple silicon (#17755) * chore(deps): bump sysinfo from 0.37.0 to 0.37.1 (#17800) Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.37.0 to 0.37.1. - [Changelog](https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md) - [Commits](https://github.com/GuillaumeGomez/sysinfo/compare/v0.37.0...v0.37.1) --- updated-dependencies: - dependency-name: sysinfo dependency-version: 0.37.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump taiki-e/install-action from 2.62.8 to 2.62.9 (#17799) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.8 to 2.62.9. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/ea0eda622640ac23a17ba349cf09e2709d58f5e1...71d339ebf191fcbc3d49cd04b9484a4261f29975) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.9 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat(spark): implement Spark `make_dt_interval` function (#17728) * feat(spark): implement Spark make_dt_interval function * fmt * delete pub * test slt * fmt * overflow -> null * sugested changes * fmt * only res in slt * null not void type * explain types * explain types fix url * better comment * Fix potential overflow when we print verbose physical plan (#17798) * change debug to trace for potential overflow * fix comments. * fix * Add SedonaDB as known user to Apache DataFusion (#17806) * Extend datatype semantic equality check to include timestamps (#17777) * Extend datatype semantic equality to include timestamps * test * Respond to comments * cargo fmt --------- Co-authored-by: Shiv Bhatia <sbhatia@palantir.com> * fix: Filter out nulls properly in approx_percentile_cont_with_weight (#17780) * chore: refactor usage of `reassign_predicate_columns` (#17703) * chore: refactor usage of `reassign_predicate_columns` * chore: Address PR comments --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * dev: Add Apache license check to the lint script (#17787) * Add liscense checker ci script * fix the deliberately added bad license header * review: use dev profile and pin the version * Fix: common_sub_expression_eliminate optimizer rule failed (#16066) Common_sub_expression_eliminate rule failed with error: `SchemaError(FieldNotFound {field: <name>}, valid_fields: []})` due to the schema being changed by the second application of `find_common_exprs` As I understood the source of the problem was in sequential call of `find_common_exprs`. First call returned original names as `aggr_expr` and changed names as `new_aggr_expr`. Second call takes into account only `new_aggr_expr` and if names was already changed by first call will return changed names as `aggr_expr`(original ones) and put them into Projection logic. I used NamePreserver mechanism to restore original schema names and generate Projection with original name at the end of aggregate optimization. Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * feat: support multi-threaded writing of Parquet files with modular encryption (#16738) * Initial commit diff --git c/Cargo.lock i/Cargo.lock index 749971532..f0b9d0a5f 100644 --- c/Cargo.lock +++ i/Cargo.lock @@ -246,52 +246,62 @@ checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50" [[package]] name = "arrow" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "fd798aea3553913a5986813e9c6ad31a2d2b04e931fe8ea4a37155eb541cebb5" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-arith", - "arrow-array", - "arrow-buffer", - "arrow-cast", + "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-csv", - "arrow-data", - "arrow-ipc", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-json", - "arrow-ord", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-pyarrow", - "arrow-row", - "arrow-schema", - "arrow-select", - "arrow-string", + "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "half", "rand 0.9.2", ] [[package]] name = "arrow-arith" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "508dafb53e5804a238cab7fd97a59ddcbfab20cc4d9814b1ab5465b9fa147f2e" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "chrono", + "num", +] + +[[package]] +name = "arrow-arith" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "chrono", "num", ] [[package]] name = "arrow-array" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e2730bc045d62bb2e53ef8395b7d4242f5c8102f41ceac15e8395b9ac3d08461" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "ahash 0.8.12", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "chrono-tz", "half", @@ -299,11 +309,35 @@ dependencies = [ "num", ] +[[package]] +name = "arrow-array" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "ahash 0.8.12", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "chrono", + "half", + "hashbrown 0.15.4", + "num", +] + [[package]] name = "arrow-buffer" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "54295b93beb702ee9a6f6fbced08ad7f4d76ec1c297952d4b83cf68755421d1d" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "bytes", + "half", + "num", +] + +[[package]] +name = "arrow-buffer" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ "bytes", "half", @@ -312,15 +346,14 @@ dependencies = [ [[package]] name = "arrow-cast" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "67e8bcb7dc971d779a7280593a1bf0c2743533b8028909073e804552e85e75b5" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "atoi", "base64 0.22.1", "chrono", @@ -332,14 +365,32 @@ dependencies = [ ] [[package]] -name = "arrow-csv" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "673fd2b5fb57a1754fdbfac425efd7cf54c947ac9950c1cce86b14e248f1c458" +name = "arrow-cast" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-array", - "arrow-cast", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "atoi", + "base64 0.22.1", + "chrono", + "half", + "lexical-core", + "num", + "ryu", +] + +[[package]] +name = "arrow-csv" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "csv", "csv-core", @@ -348,33 +399,42 @@ dependencies = [ [[package]] name = "arrow-data" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "97c22fe3da840039c69e9f61f81e78092ea36d57037b4900151f063615a2f6b4" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-buffer", - "arrow-schema", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "half", + "num", +] + +[[package]] +name = "arrow-data" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "half", "num", ] [[package]] name = "arrow-flight" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "6808d235786b721e49e228c44dd94242f2e8b46b7e95b233b0733c46e758bfee" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-arith", - "arrow-array", - "arrow-buffer", - "arrow-cast", - "arrow-data", - "arrow-ipc", - "arrow-ord", - "arrow-row", - "arrow-schema", - "arrow-select", - "arrow-string", + "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "base64 0.22.1", "bytes", "futures", @@ -382,35 +442,45 @@ dependencies = [ "paste", "prost", "prost-types", - "tonic", + "tonic 0.12.3", ] [[package]] name = "arrow-ipc" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "778de14c5a69aedb27359e3dd06dd5f9c481d5f6ee9fbae912dba332fd64636b" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "flatbuffers", "lz4_flex", "zstd", ] [[package]] -name = "arrow-json" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3860db334fe7b19fcf81f6b56f8d9d95053f3839ffe443d56b5436f7a29a1794" +name = "arrow-ipc" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-cast", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "flatbuffers", +] + +[[package]] +name = "arrow-json" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "half", "indexmap 2.10.0", @@ -424,78 +494,130 @@ dependencies = [ [[package]] name = "arrow-ord" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "425fa0b42a39d3ff55160832e7c25553e7f012c3f187def3d70313e7a29ba5d9" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", +] + +[[package]] +name = "arrow-ord" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", ] [[package]] name = "arrow-pyarrow" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d944d8ae9b77230124e6570865b570416c33a5809f32c4136c679bbe774e45c9" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "pyo3", ] [[package]] name = "arrow-row" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "df9c9423c9e71abd1b08a7f788fcd203ba2698ac8e72a1f236f1faa1a06a7414" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "half", +] + +[[package]] +name = "arrow-row" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "half", ] [[package]] name = "arrow-schema" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "85fa1babc4a45fdc64a92175ef51ff00eba5ebbc0007962fecf8022ac1c6ce28" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "bitflags 2.9.1", "serde", "serde_json", ] +[[package]] +name = "arrow-schema" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" + [[package]] name = "arrow-select" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d8854d15f1cf5005b4b358abeb60adea17091ff5bdd094dca5d3f73787d81170" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "ahash 0.8.12", - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "num", +] + +[[package]] +name = "arrow-select" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "ahash 0.8.12", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "num", ] [[package]] name = "arrow-string" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2c477e8b89e1213d5927a2a84a72c384a9bf4dd0dbf15f9fd66d821aafd9e95e" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "memchr", + "num", + "regex", + "regex-syntax", +] + +[[package]] +name = "arrow-string" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "memchr", "num", "regex", @@ -567,6 +689,28 @@ dependencies = [ "syn 2.0.106", ] +[[package]] +name = "async-stream" +version = "0.3.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b5a71a6f37880a80d1d7f19efd781e4b5de42c88f0722cc13bcb6cc2cfe8476" +dependencies = [ + "async-stream-impl", + "futures-core", + "pin-project-lite", +] + +[[package]] +name = "async-stream-impl" +version = "0.3.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c7c24de15d275a1ecfd47a380fb4d5ec9bfe0933f309ed5e705b775596a3574d" +dependencies = [ + "proc-macro2", + "quote", + "syn 2.0.104", +] + [[package]] name = "async-trait" version = "0.1.89" @@ -827,7 +971,7 @@ dependencies = [ "rustls-native-certs", "rustls-pki-types", "tokio", - "tower", + "tower 0.5.2", "tracing", ] @@ -948,18 +1092,19 @@ dependencies = [ [[package]] name = "axum" -version = "0.8.4" +version = "0.7.9" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "021e862c184ae977658b36c4500f7feac3221ca5da43e3f25bd04ab6c79a29b5" +checksum = "edca88bc138befd0323b20752846e6587272d3b03b0343c8ea28a6f819e6e71f" dependencies = [ - "axum-core", + "async-trait", + "axum-core 0.4.5", "bytes", "futures-util", "http 1.3.1", "http-body 1.0.1", "http-body-util", "itoa", - "matchit", + "matchit 0.7.3", "memchr", "mime", "percent-encoding", @@ -967,7 +1112,53 @@ dependencies = [ "rustversion", "serde", "sync_wrapper", - "tower", + "tower 0.5.2", + "tower-layer", + "tower-service", +] + +[[package]] +name = "axum" +version = "0.8.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "021e862c184ae977658b36c4500f7feac3221ca5da43e3f25bd04ab6c79a29b5" +dependencies = [ + "axum-core 0.5.2", + "bytes", + "futures-util", + "http 1.3.1", + "http-body 1.0.1", + "http-body-util", + "itoa", + "matchit 0.8.4", + "memchr", + "mime", + "percent-encoding", + "pin-project-lite", + "rustversion", + "serde", + "sync_wrapper", + "tower 0.5.2", + "tower-layer", + "tower-service", +] + +[[package]] +name = "axum-core" +version = "0.4.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09f2bd6146b97ae3359fa0cc6d6b376d9539582c7b4220f041a33ec24c226199" +dependencies = [ + "async-trait", + "bytes", + "futures-util", + "http 1.3.1", + "http-body 1.0.1", + "http-body-util", + "mime", + "pin-project-lite", + "rustversion", + "sync_wrapper", "tower-layer", "tower-service", ] @@ -1818,8 +2009,8 @@ name = "datafusion" version = "49.0.1" dependencies = [ "arrow", - "arrow-ipc", - "arrow-schema", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "bytes", "bzip2 0.6.0", @@ -1996,7 +2187,7 @@ dependencies = [ "ahash 0.8.12", "apache-avro", "arrow", - "arrow-ipc", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "base64 0.22.1", "chrono", "half", @@ -2176,7 +2367,7 @@ version = "49.0.1" dependencies = [ "arrow", "arrow-flight", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "base64 0.22.1", "bytes", @@ -2197,7 +2388,7 @@ dependencies = [ "tempfile", "test-utils", "tokio", - "tonic", + "tonic 0.13.1", "tracing", "tracing-subscriber", "url", @@ -2264,7 +2455,7 @@ version = "49.0.1" dependencies = [ "abi_stable", "arrow", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-ffi", "async-trait", "datafusion", @@ -2284,7 +2475,7 @@ name = "datafusion-functions" version = "49.0.1" dependencies = [ "arrow", - "arrow-buffer", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "base64 0.22.1", "blake2", "blake3", @@ -2347,7 +2538,7 @@ name = "datafusion-functions-nested" version = "49.0.1" dependencies = [ "arrow", - "arrow-ord", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "criterion", "datafusion-common", "datafusion-doc", @@ -2517,8 +2708,8 @@ version = "49.0.1" dependencies = [ "ahash 0.8.12", "arrow", - "arrow-ord", - "arrow-schema", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "chrono", "criterion", @@ -2589,7 +2780,7 @@ name = "datafusion-pruning" version = "49.0.1" dependencies = [ "arrow", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "datafusion-common", "datafusion-datasource", "datafusion-expr", @@ -4157,6 +4348,12 @@ dependencies = [ "pkg-config", ] +[[package]] +name = "matchit" +version = "0.7.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0e7465ac9959cc2b1404e8e2367b43684a6d13790fe23056cc8c6c5a6b7bcb94" + [[package]] name = "matchit" version = "0.8.4" @@ -4529,18 +4726,17 @@ dependencies = [ [[package]] name = "parquet" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c7288a07e…

fix

b545b27

github-actions bot added the physical-plan Changes to the physical-plan crate label Sep 20, 2025

berkaysynnada added 2 commits September 20, 2025 15:11

Update mod.rs

f30af51

Update mod.rs

43f7592

comphead approved these changes Sep 20, 2025

View reviewed changes

findepi previously requested changes Sep 21, 2025

View reviewed changes

Update mod.rs

43a323f

tests copied from v1 pr

c4bbecb

This comment was marked as outdated.

Sign in to view

alamb mentioned this pull request Sep 22, 2025

Release DataFusion 50.1.0 (minor) #17594

Closed

18 tasks

test case from review comment

e390836

apache#17684 (comment)

one more test case

bc56376

findepi approved these changes Sep 22, 2025

View reviewed changes

Merge pull request #74 from findepi/findepi/fix/exponential-window-or…

50a08f4

…dering-calc tests copied from v1 pr

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Sep 23, 2025

Update mod.rs

0e9e236

alamb approved these changes Sep 23, 2025

View reviewed changes

findepi reviewed Sep 23, 2025

View reviewed changes

alamb mentioned this pull request Sep 23, 2025

[branch-50] Prepare for 50.1.0 release #17748

Merged

berkaysynnada and others added 5 commits September 24, 2025 16:02

Update datafusion/physical-plan/src/windows/mod.rs

2b164d5

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Update datafusion/physical-plan/src/windows/mod.rs

b3543c0

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Update mod.rs

4579343

Merge branch 'fix/exponential-window-ordering-calc' of https://github…

45cbdd9

….com/synnada-ai/datafusion-upstream into fix/exponential-window-ordering-calc

Update mod.rs

e94a4b7

alamb mentioned this pull request Sep 25, 2025

Relax constraint that file sort order must only reference individual columns #17419

Merged

alamb added the performance Make DataFusion faster label Sep 25, 2025

alamb added this pull request to the merge queue Sep 25, 2025

Merged via the queue into apache:main with commit c1d6f34 Sep 25, 2025
28 checks passed

alamb mentioned this pull request Sep 25, 2025

[branch-50] Backport Prevent exponential planning time for Window functions - v2 #17684 #17778

Merged

		let mut candidate_ordering = ordering.clone();
		candidate_ordering.push(sort_expr.clone());

		if let Some(lex) = LexOrdering::new(candidate_ordering.clone()) {
		if window_eq_properties.ordering_satisfy(lex)? {

Prevent exponential planning time for Window functions - v2 #17684

Prevent exponential planning time for Window functions - v2 #17684

Uh oh!

Conversation

berkaysynnada commented Sep 20, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

berkaysynnada commented Sep 20, 2025

Uh oh!

berkaysynnada commented Sep 20, 2025

Uh oh!

alamb commented Sep 20, 2025

Uh oh!

berkaysynnada commented Sep 20, 2025

Uh oh!

comphead left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

findepi commented Sep 21, 2025

Uh oh!

findepi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

berkaysynnada commented Sep 21, 2025

Uh oh!

This comment was marked as outdated.

findepi commented Sep 22, 2025

Uh oh!

findepi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Sep 22, 2025

Uh oh!

alamb commented Sep 22, 2025

Uh oh!

alamb commented Sep 22, 2025

Uh oh!

alamb commented Sep 22, 2025

Uh oh!

berkaysynnada commented Sep 22, 2025

Uh oh!

alamb commented Sep 22, 2025

comphead left a comment •

edited

Loading

alamb commented Sep 21, 2025 •

edited

Loading

findepi left a comment •

edited

Loading

alamb commented Sep 25, 2025 •

edited

Loading