Skip to content

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Jul 1, 2025

Which issue does this PR close?

Rationale for this change

Speedup / reduce allocations for multi column sorting.

--------------------
Benchmark sort_tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ reuse_rows ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │ 190.63 ms │  188.89 ms │     no change │
│ Q2           │ 160.28 ms │  159.15 ms │     no change │
│ Q3           │ 763.61 ms │  769.22 ms │     no change │
│ Q4           │ 207.16 ms │  186.89 ms │ +1.11x faster │
│ Q5           │ 250.47 ms │  227.37 ms │ +1.10x faster │
│ Q6           │ 273.34 ms │  247.43 ms │ +1.10x faster │
│ Q7           │ 494.48 ms │  472.44 ms │     no change │
│ Q8           │ 404.28 ms │  390.38 ms │     no change │
│ Q9           │ 428.61 ms │  411.24 ms │     no change │
│ Q10          │ 618.35 ms │  606.47 ms │     no change │
│ Q11          │ 371.85 ms │  375.00 ms │     no change │
└──────────────┴───────────┴────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)         │ 4163.07ms │
│ Total Time (reuse_rows)   │ 4034.49ms │
│ Average Time (main)       │  378.46ms │
│ Average Time (reuse_rows) │  366.77ms │
│ Queries Faster            │         3 │
│ Queries Slower            │         0 │
│ Queries with No Change    │         8 │
│ Queries with Failure      │         0 │
└───────────────────────────┴───────────┘


What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jul 1, 2025
@Dandandan Dandandan changed the title Reuse Rows in RowCursorStream Reuse Rows allocation in RowCursorStream Jul 1, 2025
@Dandandan
Copy link
Contributor Author

FYI @zhuqi-lucas @acking-you

Copy link
Contributor

@zhuqi-lucas zhuqi-lucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @Dandandan for great work!

#[derive(Debug)]
pub struct RowValues {
rows: Rows,
rows: Arc<Rows>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense, thank you @Dandandan !


self.rows[stream_idx][1] = Some(Arc::clone(&rows));

// swap the curent with the previous one, so that the next poll can reuse the Rows from the previous poll
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Dandandan , this implementation is really clever:

Double‐buffer swap for smooth handoff
After appending the new data into the “cur” slot, swapping the two slots with std::mem::swap transparently rotates which buffer will be reused next. This means you always have one slot holding the “previous” data for downstream consumers and an idle slot ready for your next try_unwrap.

@acking-you
Copy link
Contributor

LGTM,I will try running the benchmark for datafusion/physical-plan/benches/sort_preserving_merge.rs to see if there is any improvement

@acking-you
Copy link
Contributor

This is the benchmark scenario where the test data has not been modified by default(multi large string):

Benchmarking bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 50.2s.
bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows
                        time:   [5.0435 s 5.0615 s 5.0813 s]
Found 3 outliers among 10 measurements (30.00%)
  1 (10.00%) low mild
  2 (20.00%) high severe

Benchmarking bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.6s or enable flat sampling.
bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows
                        time:   [157.82 ms 160.78 ms 163.05 ms]

➜  arrow-datafusion git:(main) git checkout reuse_rows                                                                                                                                                                                                            root@VM-250-221-tencentos arrow-datafusion #
branch 'reuse_rows' set up to track 'origin/reuse_rows'.
Switched to a new branch 'reuse_rows'
➜  arrow-datafusion git:(reuse_rows) cargo bench  --bench sort_preserving_merge -- --sample-size=10
Benchmarking bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 51.2s.
bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows
                        time:   [5.0404 s 5.0613 s 5.0831 s]
                        change: [-0.5635% -0.0039% +0.5493%] (p = 0.99 > 0.05)
                        No change in performance detected.

Benchmarking bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.6s or enable flat sampling.
bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows
                        time:   [155.99 ms 157.30 ms 159.18 ms]
                        change: [-3.1635% -1.4444% +0.3068%] (p = 0.15 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

The performance improvement in the test data above appears to be minimal. I suspect this might be due to the length of the string used for testing being too large, making the memory allocation overhead negligible in comparison.

So I tried to make the string smaller, and the test results are as follows:

bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows
                        time:   [757.06 ms 760.87 ms 764.68 ms]

bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows
                        time:   [209.89 ms 210.70 ms 211.52 ms]

➜  arrow-datafusion git:(main)  git checkout reuse_rows                                                                                                                                                                                                 
bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows
                        time:   [755.94 ms 758.84 ms 762.58 ms]
                        change: [-0.9202% -0.2676% +0.4455%] (p = 0.47 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows
                        time:   [209.22 ms 210.43 ms 212.07 ms]
                        change: [-0.8397% -0.1278% +0.7042%] (p = 0.78 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

The performance improvement compared to before is indeed more noticeable.

@Dandandan
Copy link
Contributor Author

Dandandan commented Jul 2, 2025

bench_merge_sorted_preserving

I had a look at this benchmark. It seems it only is testing a single 1M batch per partition/column?
If so, it wouldn't benefit from it as it doesn't reuse allocation for multiple batch inputs. Could you maybe take a look at it?

@acking-you
Copy link
Contributor

I had a look at this benchmark. It seems it only is testing a single 1M batch per partition/column?

You mean streaming back RecordBatch (e.g., batch of 8192 rows) instead of 1M all at once, right?

I'll try to tweak this aspect to make it meet expectations.

@Dandandan
Copy link
Contributor Author

I had a look at this benchmark. It seems it only is testing a single 1M batch per partition/column?

You mean streaming back RecordBatch (e.g., batch of 8192 rows) instead of 1M all at once, right?

I'll try to tweak this aspect to make it meet expectations.

Yes, otherwise the "optimization" will have no effect, as it can only re-use allocations across batches.

@acking-you
Copy link
Contributor

I made the following changes to allow the test to support returning 8192 rows each time:
Clipboard_Screenshot_1751455219

I found that reducing the length of the string at this point hardly improves performance, and keeping the original length still results in little change in performance:

bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows
                        time:   [23.693 ms 24.160 ms 24.802 ms]

bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows
                        time:   [2.9874 ms 2.9921 ms 2.9993 ms]
➜  arrow-datafusion git:(main) git checkout reuse_rows                                                                                                                                                                                                            root@VM-250-221-tencentos arrow-datafusion #
Switched to branch 'reuse_rows

bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows
                        time:   [23.796 ms 24.339 ms 24.601 ms]
                        change: [-4.5659% -2.0603% +0.7308%] (p = 0.17 > 0.05)
                        No change in performance detected.

bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows
                        time:   [2.9963 ms 3.0421 ms 3.0962 ms]
                        change: [-0.7491% +0.3715% +1.7710%] (p = 0.62 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

@Dandandan
Copy link
Contributor Author

I made the following changes to allow the test to support returning 8192 rows each time: Clipboard_Screenshot_1751455219

I found that reducing the length of the string at this point hardly improves performance, and keeping the original length still results in little change in performance:

Hmm... I meant changing the benchmark to create multiple batches, rather than a single larger or smaller one.
The SPM can only reuse allocations when there is conversion of previous batches to reuse the allocation from.

@Dandandan
Copy link
Contributor Author

Ah ok I see you changed MemoryStream to allow for this 🤔

@zhuqi-lucas
Copy link
Contributor

I believe we can increase the in-place memory for sorting benchmark here, here the default is 1MB.

The result will largely affected by the in place sort memory buffer from previous experience, details:
#15348 (comment)

@Dandandan
Copy link
Contributor Author

I believe we can increase the in-place memory for sorting benchmark here, here the default is 1MB.

The result will largely affected by the in place sort memory buffer from previous experience, details: #15348 (comment)

I would prefer to keep config default changes in a separate PR as they can be a mix in terms of performance improvements / regressions.

@Dandandan
Copy link
Contributor Author

@alamb may I request some benchmark run?

@alamb
Copy link
Contributor

alamb commented Jul 2, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reuse_rows (f92137f) to 9bb309c diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 2, 2025

@alamb may I request some benchmark run?

I really need to figure out how to script this automatically. I will see if I can get claude to do something for me

@alamb
Copy link
Contributor

alamb commented Jul 2, 2025

🤖: Benchmark completed

Details

Comparing HEAD and reuse_rows
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃  reuse_rows ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  1953.01 ms │  1893.11 ms │     no change │
│ QQuery 1     │   721.03 ms │   656.36 ms │ +1.10x faster │
│ QQuery 2     │  1383.51 ms │  1335.66 ms │     no change │
│ QQuery 3     │   664.26 ms │   661.78 ms │     no change │
│ QQuery 4     │  1377.92 ms │  1381.30 ms │     no change │
│ QQuery 5     │ 15072.82 ms │ 15197.38 ms │     no change │
│ QQuery 6     │  2078.05 ms │  2075.08 ms │     no change │
│ QQuery 7     │  1881.46 ms │  1983.89 ms │  1.05x slower │
└──────────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 25132.07ms │
│ Total Time (reuse_rows)   │ 25184.57ms │
│ Average Time (HEAD)       │  3141.51ms │
│ Average Time (reuse_rows) │  3148.07ms │
│ Queries Faster            │          1 │
│ Queries Slower            │          1 │
│ Queries with No Change    │          6 │
│ Queries with Failure      │          0 │
└───────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃  reuse_rows ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.74 ms │     2.38 ms │ +1.15x faster │
│ QQuery 1     │    34.24 ms │    33.16 ms │     no change │
│ QQuery 2     │    82.08 ms │    82.77 ms │     no change │
│ QQuery 3     │   100.60 ms │   102.93 ms │     no change │
│ QQuery 4     │   583.96 ms │   626.54 ms │  1.07x slower │
│ QQuery 5     │   836.00 ms │   869.13 ms │     no change │
│ QQuery 6     │     2.36 ms │     2.32 ms │     no change │
│ QQuery 7     │    37.51 ms │    38.29 ms │     no change │
│ QQuery 8     │   873.51 ms │   872.37 ms │     no change │
│ QQuery 9     │  1141.19 ms │  1185.72 ms │     no change │
│ QQuery 10    │   266.33 ms │   252.92 ms │ +1.05x faster │
│ QQuery 11    │   283.48 ms │   288.63 ms │     no change │
│ QQuery 12    │   866.20 ms │   872.84 ms │     no change │
│ QQuery 13    │  1241.58 ms │  1188.97 ms │     no change │
│ QQuery 14    │   799.05 ms │   817.18 ms │     no change │
│ QQuery 15    │   760.36 ms │   770.86 ms │     no change │
│ QQuery 16    │  1580.00 ms │  1624.08 ms │     no change │
│ QQuery 17    │  1572.43 ms │  1614.71 ms │     no change │
│ QQuery 18    │  2840.22 ms │  2889.03 ms │     no change │
│ QQuery 19    │    81.68 ms │    86.11 ms │  1.05x slower │
│ QQuery 20    │  1135.44 ms │  1192.00 ms │     no change │
│ QQuery 21    │  1287.21 ms │  1334.57 ms │     no change │
│ QQuery 22    │  2140.40 ms │  2188.99 ms │     no change │
│ QQuery 23    │  7476.72 ms │  7477.75 ms │     no change │
│ QQuery 24    │   441.44 ms │   446.75 ms │     no change │
│ QQuery 25    │   303.56 ms │   316.24 ms │     no change │
│ QQuery 26    │   436.34 ms │   450.19 ms │     no change │
│ QQuery 27    │  1534.35 ms │  1561.09 ms │     no change │
│ QQuery 28    │ 12621.81 ms │ 11936.16 ms │ +1.06x faster │
│ QQuery 29    │   542.03 ms │   522.25 ms │     no change │
│ QQuery 30    │   764.60 ms │   802.05 ms │     no change │
│ QQuery 31    │   799.43 ms │   809.38 ms │     no change │
│ QQuery 32    │  2415.37 ms │  2479.17 ms │     no change │
│ QQuery 33    │  3220.31 ms │  3219.38 ms │     no change │
│ QQuery 34    │  3331.66 ms │  3341.77 ms │     no change │
│ QQuery 35    │  1264.59 ms │  1294.53 ms │     no change │
│ QQuery 36    │   117.70 ms │   122.93 ms │     no change │
│ QQuery 37    │    52.42 ms │    53.12 ms │     no change │
│ QQuery 38    │   117.87 ms │   123.03 ms │     no change │
│ QQuery 39    │   192.93 ms │   195.32 ms │     no change │
│ QQuery 40    │    41.58 ms │    42.95 ms │     no change │
│ QQuery 41    │    37.21 ms │    40.35 ms │  1.08x slower │
│ QQuery 42    │    32.86 ms │    34.73 ms │  1.06x slower │
└──────────────┴─────────────┴─────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 54293.36ms │
│ Total Time (reuse_rows)   │ 54205.62ms │
│ Average Time (HEAD)       │  1262.64ms │
│ Average Time (reuse_rows) │  1260.60ms │
│ Queries Faster            │          3 │
│ Queries Slower            │          4 │
│ Queries with No Change    │         36 │
│ Queries with Failure      │          0 │
└───────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ reuse_rows ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 103.11 ms │   98.29 ms │     no change │
│ QQuery 2     │  21.54 ms │   21.26 ms │     no change │
│ QQuery 3     │  33.40 ms │   32.47 ms │     no change │
│ QQuery 4     │  19.31 ms │   18.65 ms │     no change │
│ QQuery 5     │  52.92 ms │   49.75 ms │ +1.06x faster │
│ QQuery 6     │  12.19 ms │   12.04 ms │     no change │
│ QQuery 7     │  93.10 ms │   87.08 ms │ +1.07x faster │
│ QQuery 8     │  26.38 ms │   24.84 ms │ +1.06x faster │
│ QQuery 9     │  55.52 ms │   53.95 ms │     no change │
│ QQuery 10    │  44.77 ms │   42.49 ms │ +1.05x faster │
│ QQuery 11    │  11.70 ms │   11.34 ms │     no change │
│ QQuery 12    │  35.33 ms │   34.71 ms │     no change │
│ QQuery 13    │  26.48 ms │   26.61 ms │     no change │
│ QQuery 14    │  10.06 ms │    9.64 ms │     no change │
│ QQuery 15    │  19.51 ms │   19.86 ms │     no change │
│ QQuery 16    │  18.27 ms │   17.78 ms │     no change │
│ QQuery 17    │  97.82 ms │   97.02 ms │     no change │
│ QQuery 18    │ 198.33 ms │  195.17 ms │     no change │
│ QQuery 19    │  25.43 ms │   25.38 ms │     no change │
│ QQuery 20    │  31.38 ms │   31.68 ms │     no change │
│ QQuery 21    │ 143.85 ms │  141.03 ms │     no change │
│ QQuery 22    │  14.69 ms │   14.63 ms │     no change │
└──────────────┴───────────┴────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 1095.07ms │
│ Total Time (reuse_rows)   │ 1065.67ms │
│ Average Time (HEAD)       │   49.78ms │
│ Average Time (reuse_rows) │   48.44ms │
│ Queries Faster            │         4 │
│ Queries Slower            │         0 │
│ Queries with No Change    │        18 │
│ Queries with Failure      │         0 │
└───────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Jul 2, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reuse_rows (f92137f) to 9bb309c diff using: sort_tpch
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 2, 2025

🤖: Benchmark completed

Details

Comparing HEAD and reuse_rows
--------------------
Benchmark sort_tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ reuse_rows ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │  328.47 ms │  320.83 ms │     no change │
│ Q2           │  309.36 ms │  279.16 ms │ +1.11x faster │
│ Q3           │ 1049.20 ms │ 1048.02 ms │     no change │
│ Q4           │  411.13 ms │  420.18 ms │     no change │
│ Q5           │  390.84 ms │  382.13 ms │     no change │
│ Q6           │  423.79 ms │  422.99 ms │     no change │
│ Q7           │  766.07 ms │  776.97 ms │     no change │
│ Q8           │  669.95 ms │  690.28 ms │     no change │
│ Q9           │  718.81 ms │  706.33 ms │     no change │
│ Q10          │ 1033.87 ms │ 1017.72 ms │     no change │
│ Q11          │  582.44 ms │  629.02 ms │  1.08x slower │
└──────────────┴────────────┴────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 6683.92ms │
│ Total Time (reuse_rows)   │ 6693.63ms │
│ Average Time (HEAD)       │  607.63ms │
│ Average Time (reuse_rows) │  608.51ms │
│ Queries Faster            │         1 │
│ Queries Slower            │         1 │
│ Queries with No Change    │         9 │
│ Queries with Failure      │         0 │
└───────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor Author

Dandandan commented Jul 2, 2025

🤖: Benchmark completed

Details

Comparing HEAD and reuse_rows
--------------------
Benchmark sort_tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ reuse_rows ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │  328.47 ms │  320.83 ms │     no change │
│ Q2           │  309.36 ms │  279.16 ms │ +1.11x faster │
│ Q3           │ 1049.20 ms │ 1048.02 ms │     no change │
│ Q4           │  411.13 ms │  420.18 ms │     no change │
│ Q5           │  390.84 ms │  382.13 ms │     no change │
│ Q6           │  423.79 ms │  422.99 ms │     no change │
│ Q7           │  766.07 ms │  776.97 ms │     no change │
│ Q8           │  669.95 ms │  690.28 ms │     no change │
│ Q9           │  718.81 ms │  706.33 ms │     no change │
│ Q10          │ 1033.87 ms │ 1017.72 ms │     no change │
│ Q11          │  582.44 ms │  629.02 ms │  1.08x slower │
└──────────────┴────────────┴────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 6683.92ms │
│ Total Time (reuse_rows)   │ 6693.63ms │
│ Average Time (HEAD)       │  607.63ms │
│ Average Time (reuse_rows) │  608.51ms │
│ Queries Faster            │         1 │
│ Queries Slower            │         1 │
│ Queries with No Change    │         9 │
│ Queries with Failure      │         0 │
└───────────────────────────┴───────────┘

Hm interesting, less pronounced than in my case 🤔

@alamb
Copy link
Contributor

alamb commented Jul 2, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reuse_rows (f92137f) to 9bb309c diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 2, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reuse_rows (f92137f) to 9bb309c diff using: sort_tpch
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 2, 2025

🤖: Benchmark completed

Details

Comparing HEAD and reuse_rows
--------------------
Benchmark sort_tpch.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ reuse_rows ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │  321.55 ms │  326.80 ms │     no change │
│ Q2           │  311.89 ms │  275.89 ms │ +1.13x faster │
│ Q3           │ 1050.21 ms │ 1081.16 ms │     no change │
│ Q4           │  414.20 ms │  416.49 ms │     no change │
│ Q5           │  382.07 ms │  391.35 ms │     no change │
│ Q6           │  427.73 ms │  422.09 ms │     no change │
│ Q7           │  767.09 ms │  761.99 ms │     no change │
│ Q8           │  674.66 ms │  668.79 ms │     no change │
│ Q9           │  697.31 ms │  718.16 ms │     no change │
│ Q10          │ 1018.40 ms │ 1029.83 ms │     no change │
│ Q11          │  571.35 ms │  632.45 ms │  1.11x slower │
└──────────────┴────────────┴────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 6636.45ms │
│ Total Time (reuse_rows)   │ 6724.99ms │
│ Average Time (HEAD)       │  603.31ms │
│ Average Time (reuse_rows) │  611.36ms │
│ Queries Faster            │         1 │
│ Queries Slower            │         1 │
│ Queries with No Change    │         9 │
│ Queries with Failure      │         0 │
└───────────────────────────┴───────────┘

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite cool @Dandandan

My only real concern is that this code will be tricky to maintain and could easily get reverted / regressed as part of a follow on change

I suggest we try and encapsulate it into a struct (I left a suggestion)

Thank you (as always) for the reviews @zhuqi-lucas

cc @tustvold and @crepererum for your amusement

@Dandandan
Copy link
Contributor Author

This is quite cool @Dandandan

My only real concern is that this code will be tricky to maintain and could easily get reverted / regressed as part of a follow on change

I suggest we try and encapsulate it into a struct (I left a suggestion)

Thank you (as always) for the reviews @zhuqi-lucas

cc @tustvold and @crepererum for your amusement

Yeah - I agree. Another option might be to panic or return an error on not maintaining the condition of keeping max 1 (extra) batch buffered in the consumer. This way the tests would fail, but it would be a small breaking change for RowCursorStream. Now that I think about it, RowCursorStream::new( has also a minor breaking change. I also think probably not many will use RowCursorStream directly.

@alamb alamb added the api change Changes the API exposed to users of the crate label Jul 2, 2025
@zhuqi-lucas
Copy link
Contributor

Sometimes, i found sort_tpch10 will get the more accurate or good result when we optimize the merge part, because our in_mem sort buffer is 1MB, so the sort_tpch will have less count for merge compare count, i added the sort_tpch10 to bench.sh, hope it will be helpful:

#16671

@Dandandan
Copy link
Contributor Author

Sometimes, i found sort_tpch10 will get the more accurate or good result when we optimize the merge part, because our in_mem sort buffer is 1MB, so the sort_tpch will have less count for merge compare count, i added the sort_tpch10 to bench.sh, hope it will be helpful:

#16671

Ah, that makes sense, maybe that explains the difference in results. Let's compare sort_tpch10 as well.

@Dandandan
Copy link
Contributor Author

--------------------
Benchmark sort_tpch10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        main ┃  reuse_rows ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │  2461.77 ms │  2455.28 ms │     no change │
│ Q2           │  1946.39 ms │  1945.72 ms │     no change │
│ Q3           │  8610.43 ms │  8611.21 ms │     no change │
│ Q4           │  2445.42 ms │  2317.70 ms │ +1.06x faster │
│ Q5           │  3015.42 ms │  2780.20 ms │ +1.08x faster │
│ Q6           │  3238.44 ms │  2837.96 ms │ +1.14x faster │
│ Q7           │  6225.16 ms │  5591.73 ms │ +1.11x faster │
│ Q8           │  4348.99 ms │  4123.76 ms │ +1.05x faster │
│ Q9           │  6092.36 ms │  5238.98 ms │ +1.16x faster │
│ Q10          │ 17189.69 ms │ 17209.64 ms │     no change │
│ Q11          │  4950.33 ms │  4833.49 ms │     no change │
└──────────────┴─────────────┴─────────────┴───────────────┘

@alamb could you maybe run it on sort_tpch10. Perhaps the difference is different when using more cores (16 vs 10 I believe?).

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

@alamb could you maybe run it on sort_tpch10. Perhaps the difference is different when using more cores (16 vs 10 I believe?).

I have queued this up and it should run in a few minutes

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reuse_rows (68bf187) to 3ca09a6 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖: Benchmark completed

Details

Comparing HEAD and reuse_rows
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃  reuse_rows ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  1892.80 ms │  1995.13 ms │ 1.05x slower │
│ QQuery 1     │   721.33 ms │   751.38 ms │    no change │
│ QQuery 2     │  1423.80 ms │  1460.24 ms │    no change │
│ QQuery 3     │   690.12 ms │   685.43 ms │    no change │
│ QQuery 4     │  1342.02 ms │  1410.65 ms │ 1.05x slower │
│ QQuery 5     │ 14966.46 ms │ 15083.15 ms │    no change │
│ QQuery 6     │  2079.26 ms │  2079.94 ms │    no change │
│ QQuery 7     │  1919.50 ms │  1935.32 ms │    no change │
└──────────────┴─────────────┴─────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 25035.30ms │
│ Total Time (reuse_rows)   │ 25401.24ms │
│ Average Time (HEAD)       │  3129.41ms │
│ Average Time (reuse_rows) │  3175.15ms │
│ Queries Faster            │          0 │
│ Queries Slower            │          2 │
│ Queries with No Change    │          6 │
│ Queries with Failure      │          0 │
└───────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃  reuse_rows ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.20 ms │     2.33 ms │ 1.06x slower │
│ QQuery 1     │    35.24 ms │    34.20 ms │    no change │
│ QQuery 2     │    80.87 ms │    83.74 ms │    no change │
│ QQuery 3     │    99.38 ms │   101.62 ms │    no change │
│ QQuery 4     │   591.15 ms │   626.42 ms │ 1.06x slower │
│ QQuery 5     │   873.36 ms │   879.89 ms │    no change │
│ QQuery 6     │     2.28 ms │     2.32 ms │    no change │
│ QQuery 7     │    38.47 ms │    39.76 ms │    no change │
│ QQuery 8     │   858.65 ms │   862.47 ms │    no change │
│ QQuery 9     │  1144.11 ms │  1211.27 ms │ 1.06x slower │
│ QQuery 10    │   259.24 ms │   260.51 ms │    no change │
│ QQuery 11    │   287.65 ms │   286.36 ms │    no change │
│ QQuery 12    │   891.27 ms │   868.32 ms │    no change │
│ QQuery 13    │  1260.99 ms │  1282.20 ms │    no change │
│ QQuery 14    │   821.87 ms │   825.39 ms │    no change │
│ QQuery 15    │   778.05 ms │   779.16 ms │    no change │
│ QQuery 16    │  1611.08 ms │  1600.17 ms │    no change │
│ QQuery 17    │  1608.65 ms │  1616.60 ms │    no change │
│ QQuery 18    │  2849.91 ms │  2878.13 ms │    no change │
│ QQuery 19    │    87.22 ms │    90.79 ms │    no change │
│ QQuery 20    │  1166.63 ms │  1158.05 ms │    no change │
│ QQuery 21    │  1294.14 ms │  1311.97 ms │    no change │
│ QQuery 22    │  2179.42 ms │  2193.61 ms │    no change │
│ QQuery 23    │  7414.55 ms │  7525.59 ms │    no change │
│ QQuery 24    │   445.35 ms │   444.16 ms │    no change │
│ QQuery 25    │   310.13 ms │   313.14 ms │    no change │
│ QQuery 26    │   443.61 ms │   440.89 ms │    no change │
│ QQuery 27    │  1541.45 ms │  1589.32 ms │    no change │
│ QQuery 28    │ 12875.77 ms │ 12833.81 ms │    no change │
│ QQuery 29    │   523.96 ms │   511.63 ms │    no change │
│ QQuery 30    │   779.62 ms │   784.97 ms │    no change │
│ QQuery 31    │   800.17 ms │   803.30 ms │    no change │
│ QQuery 32    │  2406.40 ms │  2454.64 ms │    no change │
│ QQuery 33    │  3176.19 ms │  3236.93 ms │    no change │
│ QQuery 34    │  3273.65 ms │  3241.92 ms │    no change │
│ QQuery 35    │  1233.82 ms │  1228.87 ms │    no change │
│ QQuery 36    │   124.97 ms │   121.39 ms │    no change │
│ QQuery 37    │    52.30 ms │    52.01 ms │    no change │
│ QQuery 38    │   125.68 ms │   122.75 ms │    no change │
│ QQuery 39    │   201.44 ms │   195.43 ms │    no change │
│ QQuery 40    │    42.07 ms │    41.24 ms │    no change │
│ QQuery 41    │    38.20 ms │    39.22 ms │    no change │
│ QQuery 42    │    33.02 ms │    32.69 ms │    no change │
└──────────────┴─────────────┴─────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary         ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 54664.19ms │
│ Total Time (reuse_rows)   │ 55009.20ms │
│ Average Time (HEAD)       │  1271.26ms │
│ Average Time (reuse_rows) │  1279.28ms │
│ Queries Faster            │          0 │
│ Queries Slower            │          3 │
│ Queries with No Change    │         40 │
│ Queries with Failure      │          0 │
└───────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ reuse_rows ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  99.48 ms │   95.49 ms │     no change │
│ QQuery 2     │  21.22 ms │   21.21 ms │     no change │
│ QQuery 3     │  32.82 ms │   32.54 ms │     no change │
│ QQuery 4     │  18.62 ms │   18.53 ms │     no change │
│ QQuery 5     │  49.49 ms │   50.08 ms │     no change │
│ QQuery 6     │  11.99 ms │   11.98 ms │     no change │
│ QQuery 7     │  90.94 ms │   89.35 ms │     no change │
│ QQuery 8     │  23.52 ms │   24.76 ms │  1.05x slower │
│ QQuery 9     │  54.94 ms │   53.99 ms │     no change │
│ QQuery 10    │  43.44 ms │   42.49 ms │     no change │
│ QQuery 11    │  11.37 ms │   11.41 ms │     no change │
│ QQuery 12    │  34.74 ms │   34.90 ms │     no change │
│ QQuery 13    │  27.06 ms │   25.69 ms │ +1.05x faster │
│ QQuery 14    │   9.82 ms │    9.72 ms │     no change │
│ QQuery 15    │  19.86 ms │   18.68 ms │ +1.06x faster │
│ QQuery 16    │  18.29 ms │   18.11 ms │     no change │
│ QQuery 17    │  96.64 ms │   97.01 ms │     no change │
│ QQuery 18    │ 195.08 ms │  191.65 ms │     no change │
│ QQuery 19    │  25.14 ms │   24.82 ms │     no change │
│ QQuery 20    │  31.41 ms │   31.76 ms │     no change │
│ QQuery 21    │ 146.50 ms │  145.09 ms │     no change │
│ QQuery 22    │  14.83 ms │   14.24 ms │     no change │
└──────────────┴───────────┴────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 1077.18ms │
│ Total Time (reuse_rows)   │ 1063.49ms │
│ Average Time (HEAD)       │   48.96ms │
│ Average Time (reuse_rows) │   48.34ms │
│ Queries Faster            │         2 │
│ Queries Slower            │         1 │
│ Queries with No Change    │        19 │
│ Queries with Failure      │         0 │
└───────────────────────────┴───────────┘

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reuse_rows (68bf187) to 3ca09a6 diff using: sort_tpch10
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖: Benchmark completed

Details

Comparing HEAD and reuse_rows
--------------------
Benchmark sort_tpch10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃ HEAD ┃ reuse_rows ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Q1           │ FAIL │       FAIL │ incomparable │
│ Q2           │ FAIL │       FAIL │ incomparable │
│ Q3           │ FAIL │       FAIL │ incomparable │
│ Q4           │ FAIL │       FAIL │ incomparable │
│ Q5           │ FAIL │       FAIL │ incomparable │
│ Q6           │ FAIL │       FAIL │ incomparable │
│ Q7           │ FAIL │       FAIL │ incomparable │
│ Q8           │ FAIL │       FAIL │ incomparable │
│ Q9           │ FAIL │       FAIL │ incomparable │
│ Q10          │ FAIL │       FAIL │ incomparable │
│ Q11          │ FAIL │       FAIL │ incomparable │
└──────────────┴──────┴────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary         ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)         │ 0.00ms │
│ Total Time (reuse_rows)   │ 0.00ms │
│ Average Time (HEAD)       │ 0.00ms │
│ Average Time (reuse_rows) │ 0.00ms │
│ Queries Faster            │      0 │
│ Queries Slower            │      0 │
│ Queries with No Change    │      0 │
│ Queries with Failure      │     11 │
└───────────────────────────┴────────┘

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing reuse_rows (68bf187) to 3ca09a6 diff using: sort_tpch
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖: Benchmark completed

Details

Comparing HEAD and reuse_rows
--------------------
Benchmark sort_tpch1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ reuse_rows ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1           │  327.12 ms │  357.60 ms │  1.09x slower │
│ Q2           │  280.32 ms │  298.27 ms │  1.06x slower │
│ Q3           │ 1028.16 ms │ 1004.96 ms │     no change │
│ Q4           │  413.09 ms │  415.63 ms │     no change │
│ Q5           │  391.77 ms │  384.10 ms │     no change │
│ Q6           │  433.10 ms │  414.13 ms │     no change │
│ Q7           │  762.56 ms │  751.43 ms │     no change │
│ Q8           │  677.01 ms │  667.51 ms │     no change │
│ Q9           │  709.27 ms │  689.68 ms │     no change │
│ Q10          │ 1021.20 ms │ 1013.90 ms │     no change │
│ Q11          │  616.84 ms │  577.70 ms │ +1.07x faster │
└──────────────┴────────────┴────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary         ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)         │ 6660.45ms │
│ Total Time (reuse_rows)   │ 6574.90ms │
│ Average Time (HEAD)       │  605.50ms │
│ Average Time (reuse_rows) │  597.72ms │
│ Queries Faster            │         1 │
│ Queries Slower            │         2 │
│ Queries with No Change    │         8 │
│ Queries with Failure      │         0 │
└───────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor Author

🤖: Benchmark completed

Details

Comparing HEAD and reuse_rows
--------------------
Benchmark sort_tpch10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃ HEAD ┃ reuse_rows ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Q1           │ FAIL │       FAIL │ incomparable │
│ Q2           │ FAIL │       FAIL │ incomparable │
│ Q3           │ FAIL │       FAIL │ incomparable │
│ Q4           │ FAIL │       FAIL │ incomparable │
│ Q5           │ FAIL │       FAIL │ incomparable │
│ Q6           │ FAIL │       FAIL │ incomparable │
│ Q7           │ FAIL │       FAIL │ incomparable │
│ Q8           │ FAIL │       FAIL │ incomparable │
│ Q9           │ FAIL │       FAIL │ incomparable │
│ Q10          │ FAIL │       FAIL │ incomparable │
│ Q11          │ FAIL │       FAIL │ incomparable │
└──────────────┴──────┴────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary         ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)         │ 0.00ms │
│ Total Time (reuse_rows)   │ 0.00ms │
│ Average Time (HEAD)       │ 0.00ms │
│ Average Time (reuse_rows) │ 0.00ms │
│ Queries Faster            │      0 │
│ Queries Slower            │      0 │
│ Queries with No Change    │      0 │
│ Queries with Failure      │     11 │
└───────────────────────────┴────────┘

LOL

@zhuqi-lucas
Copy link
Contributor

🤖: Benchmark completed
Details

Comparing HEAD and reuse_rows
--------------------
Benchmark sort_tpch10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃ HEAD ┃ reuse_rows ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Q1           │ FAIL │       FAIL │ incomparable │
│ Q2           │ FAIL │       FAIL │ incomparable │
│ Q3           │ FAIL │       FAIL │ incomparable │
│ Q4           │ FAIL │       FAIL │ incomparable │
│ Q5           │ FAIL │       FAIL │ incomparable │
│ Q6           │ FAIL │       FAIL │ incomparable │
│ Q7           │ FAIL │       FAIL │ incomparable │
│ Q8           │ FAIL │       FAIL │ incomparable │
│ Q9           │ FAIL │       FAIL │ incomparable │
│ Q10          │ FAIL │       FAIL │ incomparable │
│ Q11          │ FAIL │       FAIL │ incomparable │
└──────────────┴──────┴────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary         ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)         │ 0.00ms │
│ Total Time (reuse_rows)   │ 0.00ms │
│ Average Time (HEAD)       │ 0.00ms │
│ Average Time (reuse_rows) │ 0.00ms │
│ Queries Faster            │      0 │
│ Queries Slower            │      0 │
│ Queries with No Change    │      0 │
│ Queries with Failure      │     11 │
└───────────────────────────┴────────┘

LOL

😺 Is it due to no updating for sort_tpch10 for the benchmark script?

Dandandan and others added 4 commits July 3, 2025 18:11
@Dandandan Dandandan merged commit 8c5d06d into apache:main Jul 4, 2025
27 checks passed
Standing-Man pushed a commit to Standing-Man/datafusion that referenced this pull request Jul 4, 2025
* Reuse Rows in RowCursorStream

* WIP

* Fmt

* Add comment, make it backwards compatible

* Add comment, make it backwards compatible

* Add comment, make it backwards compatible

* Clippy

* Clippy

* Return error on non-unique reference

* Comment

* Update datafusion/physical-plan/src/sorts/stream.rs

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

* Fix

* Extract logic

* Doc fix

---------

Co-authored-by: Oleks V <comphead@users.noreply.github.com>

/// A pair of `Arc<Rows>` that can be reused
#[derive(Debug)]
struct ReusableRows {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb
Copy link
Contributor

alamb commented Jul 7, 2025

As a random aside, this is the kind of micro optimization that I would have been terrified of making in C/C++ land without extreme care to avoid concurrent access.

With Rust I trust the compiler 🦾

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reuse Rows allocation in SortPreservingMergeStream / RowCursorStream

5 participants