Skip to content

Concatenate inside hash repartition #16223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Jun 1, 2025

Which issue does this PR close?

--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ concat_in_repartition ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  63.26ms │               63.39ms │     no change │
│ QQuery 2     │  13.65ms │               12.75ms │ +1.07x faster │
│ QQuery 3     │  22.04ms │               22.57ms │     no change │
│ QQuery 4     │  13.17ms │               11.66ms │ +1.13x faster │
│ QQuery 5     │  34.82ms │               35.57ms │     no change │
│ QQuery 6     │  10.93ms │               10.39ms │     no change │
│ QQuery 7     │  70.36ms │               69.08ms │     no change │
│ QQuery 8     │  17.39ms │               17.54ms │     no change │
│ QQuery 9     │  39.17ms │               37.51ms │     no change │
│ QQuery 10    │  37.11ms │               36.13ms │     no change │
│ QQuery 11    │   5.79ms │                5.80ms │     no change │
│ QQuery 12    │  32.15ms │               32.14ms │     no change │
│ QQuery 13    │  19.50ms │               18.44ms │ +1.06x faster │
│ QQuery 14    │   5.36ms │                5.54ms │     no change │
│ QQuery 15    │  12.10ms │               12.09ms │     no change │
│ QQuery 16    │  14.19ms │               14.93ms │  1.05x slower │
│ QQuery 17    │  59.43ms │               55.86ms │ +1.06x faster │
│ QQuery 18    │ 136.18ms │              128.38ms │ +1.06x faster │
│ QQuery 19    │  21.94ms │               19.10ms │ +1.15x faster │
│ QQuery 20    │  21.38ms │               20.65ms │     no change │
│ QQuery 21    │  92.81ms │               93.23ms │     no change │
│ QQuery 22    │  12.41ms │               12.78ms │     no change │
└──────────────┴──────────┴───────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                    ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (main)                    │ 755.16ms │
│ Total Time (concat_in_repartition)   │ 735.53ms │
│ Average Time (main)                  │  34.33ms │
│ Average Time (concat_in_repartition) │  33.43ms │
│ Queries Faster                       │        6 │
│ Queries Slower                       │        1 │
│ Queries with No Change               │       15 │
└──────────────────────────────────────┴──────────┘
--------------------
Benchmark tpch_mem_sf10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ concat_in_repartition ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  580.08ms │              577.68ms │     no change │
│ QQuery 2     │  112.00ms │              111.95ms │     no change │
│ QQuery 3     │  233.29ms │              239.08ms │     no change │
│ QQuery 4     │  123.81ms │              113.98ms │ +1.09x faster │
│ QQuery 5     │  462.88ms │              481.75ms │     no change │
│ QQuery 6     │   86.76ms │               91.73ms │  1.06x slower │
│ QQuery 7     │  976.26ms │              991.59ms │     no change │
│ QQuery 8     │  318.50ms │              349.00ms │  1.10x slower │
│ QQuery 9     │  775.75ms │              814.27ms │     no change │
│ QQuery 10    │  401.89ms │              395.42ms │     no change │
│ QQuery 11    │   81.35ms │               77.24ms │ +1.05x faster │
│ QQuery 12    │  315.57ms │              307.22ms │     no change │
│ QQuery 13    │  280.05ms │              246.05ms │ +1.14x faster │
│ QQuery 14    │   45.80ms │               48.39ms │  1.06x slower │
│ QQuery 15    │  114.20ms │              115.17ms │     no change │
│ QQuery 16    │   87.50ms │               86.10ms │     no change │
│ QQuery 17    │  856.14ms │              894.89ms │     no change │
│ QQuery 18    │ 2645.17ms │             2316.60ms │ +1.14x faster │
│ QQuery 19    │  161.27ms │              164.10ms │     no change │
│ QQuery 20    │  214.24ms │              218.33ms │     no change │
│ QQuery 21    │ 1473.00ms │             1386.08ms │ +1.06x faster │
│ QQuery 22    │   98.78ms │               94.21ms │     no change │
└──────────────┴───────────┴───────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                    │ 10444.29ms │
│ Total Time (concat_in_repartition)   │ 10120.81ms │
│ Average Time (main)                  │   474.74ms │
│ Average Time (concat_in_repartition) │   460.04ms │
│ Queries Faster                       │          5 │
│ Queries Slower                       │          3 │
│ Queries with No Change               │         14 │
└──────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ concat_in_repartition ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 102.62ms │              104.12ms │     no change │
│ QQuery 2     │  47.89ms │               48.16ms │     no change │
│ QQuery 3     │  55.33ms │               52.82ms │     no change │
│ QQuery 4     │  43.64ms │               40.18ms │ +1.09x faster │
│ QQuery 5     │  78.05ms │               74.44ms │     no change │
│ QQuery 6     │  26.82ms │               26.47ms │     no change │
│ QQuery 7     │  88.89ms │               88.91ms │     no change │
│ QQuery 8     │  70.95ms │               72.93ms │     no change │
│ QQuery 9     │  99.19ms │               97.91ms │     no change │
│ QQuery 10    │ 100.09ms │              102.30ms │     no change │
│ QQuery 11    │  37.46ms │               39.16ms │     no change │
│ QQuery 12    │  58.27ms │               57.89ms │     no change │
│ QQuery 13    │ 131.15ms │              128.73ms │     no change │
│ QQuery 14    │  36.52ms │               37.55ms │     no change │
│ QQuery 15    │  44.09ms │               44.30ms │     no change │
│ QQuery 16    │  29.26ms │               28.05ms │     no change │
│ QQuery 17    │ 112.79ms │              112.40ms │     no change │
│ QQuery 18    │ 152.36ms │              146.60ms │     no change │
│ QQuery 19    │  61.73ms │               62.32ms │     no change │
│ QQuery 20    │  56.72ms │               55.37ms │     no change │
│ QQuery 21    │ 117.40ms │              114.14ms │     no change │
│ QQuery 22    │  31.38ms │               29.80ms │ +1.05x faster │
└──────────────┴──────────┴───────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                    │ 1582.60ms │
│ Total Time (concat_in_repartition)   │ 1564.58ms │
│ Average Time (main)                  │   71.94ms │
│ Average Time (concat_in_repartition) │   71.12ms │
│ Queries Faster                       │         2 │
│ Queries Slower                       │         0 │
│ Queries with No Change               │        20 │
└──────────────────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ concat_in_repartition ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  809.46ms │              806.25ms │     no change │
│ QQuery 2     │  165.80ms │              159.27ms │     no change │
│ QQuery 3     │  432.56ms │              403.77ms │ +1.07x faster │
│ QQuery 4     │  459.88ms │              442.66ms │     no change │
│ QQuery 5     │  671.13ms │              619.00ms │ +1.08x faster │
│ QQuery 6     │  181.36ms │              189.14ms │     no change │
│ QQuery 7     │  959.96ms │              894.05ms │ +1.07x faster │
│ QQuery 8     │  672.39ms │              658.85ms │     no change │
│ QQuery 9     │ 1101.98ms │             1138.28ms │     no change │
│ QQuery 10    │  638.41ms │              650.97ms │     no change │
│ QQuery 11    │  126.22ms │              118.40ms │ +1.07x faster │
│ QQuery 12    │  358.25ms │              354.98ms │     no change │
│ QQuery 13    │  720.99ms │              722.20ms │     no change │
│ QQuery 14    │  247.23ms │              244.06ms │     no change │
│ QQuery 15    │  395.58ms │              383.78ms │     no change │
│ QQuery 16    │  110.37ms │              101.86ms │ +1.08x faster │
│ QQuery 17    │ 1193.78ms │             1189.51ms │     no change │
│ QQuery 18    │ 1846.58ms │             1572.72ms │ +1.17x faster │
│ QQuery 19    │  412.76ms │              404.37ms │     no change │
│ QQuery 20    │  421.08ms │              398.91ms │ +1.06x faster │
│ QQuery 21    │ 1363.39ms │             1247.48ms │ +1.09x faster │
│ QQuery 22    │  149.67ms │              141.16ms │ +1.06x faster │
└──────────────┴───────────┴───────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                    │ 13438.84ms │
│ Total Time (concat_in_repartition)   │ 12841.67ms │
│ Average Time (main)                  │   610.86ms │
│ Average Time (concat_in_repartition) │   583.71ms │
│ Queries Faster                       │          9 │
│ Queries Slower                       │          0 │
│ Queries with No Change               │         13 │
└──────────────────────────────────────┴────────────┘

Rationale for this change

Recently, I found interleave_batches to be faster than the existing code.
That actually doesn't have anything to do with interleave being faster (in fact, it is slower), but the fact that we don't send num_partition batches per input batch to the output channels.
It takes individual batches and sends them to the output channels (and directly blocking progress as the batches have been sent upstream and may all be quickly "non-empty").

We can fix this by internally concatenating the input arrays inside RepartitionExec.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Jun 1, 2025
@Dandandan Dandandan closed this Jun 1, 2025
@Dandandan Dandandan reopened this Jun 1, 2025
@Dandandan Dandandan closed this Jun 1, 2025
@Dandandan Dandandan reopened this Jun 1, 2025
@Dandandan Dandandan marked this pull request as ready for review June 1, 2025 16:26
@Dandandan
Copy link
Contributor Author

FYI @alamb this relates to your quest to remove CoalesceBatches (this doesn't yet remove concat but it shows the potential for optimization).

@alamb
Copy link
Contributor

alamb commented Jun 1, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing concat_in_repartition (dc7df1a) to 6844e56 diff
Benchmarks: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jun 1, 2025

🤖: Benchmark completed

Details

Comparing HEAD and concat_in_repartition
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ concat_in_repartition ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  1913.26ms │             1919.02ms │    no change │
│ QQuery 1     │   722.29ms │              700.68ms │    no change │
│ QQuery 2     │  1500.38ms │             1467.99ms │    no change │
│ QQuery 3     │   696.92ms │              696.76ms │    no change │
│ QQuery 4     │  1495.37ms │             1489.28ms │    no change │
│ QQuery 5     │ 15779.55ms │            16668.91ms │ 1.06x slower │
│ QQuery 6     │  2116.95ms │             2044.90ms │    no change │
│ QQuery 7     │  2125.37ms │             2089.67ms │    no change │
│ QQuery 8     │   863.22ms │              867.06ms │    no change │
└──────────────┴────────────┴───────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                    │ 27213.31ms │
│ Total Time (concat_in_repartition)   │ 27944.28ms │
│ Average Time (HEAD)                  │  3023.70ms │
│ Average Time (concat_in_repartition) │  3104.92ms │
│ Queries Faster                       │          0 │
│ Queries Slower                       │          1 │
│ Queries with No Change               │          8 │
└──────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ concat_in_repartition ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │    15.86ms │               14.85ms │ +1.07x faster │
│ QQuery 1     │    33.79ms │               34.30ms │     no change │
│ QQuery 2     │    81.29ms │               82.32ms │     no change │
│ QQuery 3     │    97.67ms │               97.49ms │     no change │
│ QQuery 4     │   593.72ms │              576.81ms │     no change │
│ QQuery 5     │   872.11ms │              831.23ms │     no change │
│ QQuery 6     │    23.55ms │               21.66ms │ +1.09x faster │
│ QQuery 7     │    39.27ms │               36.84ms │ +1.07x faster │
│ QQuery 8     │   917.74ms │              907.24ms │     no change │
│ QQuery 9     │  1201.79ms │             1202.36ms │     no change │
│ QQuery 10    │   270.42ms │              262.42ms │     no change │
│ QQuery 11    │   303.32ms │              292.44ms │     no change │
│ QQuery 12    │   927.99ms │              923.22ms │     no change │
│ QQuery 13    │  1297.08ms │             1246.41ms │     no change │
│ QQuery 14    │   857.34ms │              870.62ms │     no change │
│ QQuery 15    │   833.84ms │              814.85ms │     no change │
│ QQuery 16    │  1782.59ms │             1747.95ms │     no change │
│ QQuery 17    │  1653.21ms │             1631.09ms │     no change │
│ QQuery 18    │  3134.19ms │             3105.45ms │     no change │
│ QQuery 19    │    84.39ms │               88.66ms │  1.05x slower │
│ QQuery 20    │  1190.98ms │             1141.01ms │     no change │
│ QQuery 21    │  1395.64ms │             1329.52ms │     no change │
│ QQuery 22    │  2330.04ms │             2243.38ms │     no change │
│ QQuery 23    │  8406.43ms │             8188.72ms │     no change │
│ QQuery 24    │   489.13ms │              479.48ms │     no change │
│ QQuery 25    │   429.30ms │              395.13ms │ +1.09x faster │
│ QQuery 26    │   559.01ms │              538.71ms │     no change │
│ QQuery 27    │  1690.54ms │             1647.99ms │     no change │
│ QQuery 28    │ 12583.01ms │            13495.15ms │  1.07x slower │
│ QQuery 29    │   516.65ms │              543.13ms │  1.05x slower │
│ QQuery 30    │   816.74ms │              816.88ms │     no change │
│ QQuery 31    │   860.30ms │              844.47ms │     no change │
│ QQuery 32    │  2704.29ms │             2735.08ms │     no change │
│ QQuery 33    │  3406.90ms │             3388.81ms │     no change │
│ QQuery 34    │  3420.93ms │             3478.97ms │     no change │
│ QQuery 35    │  1292.91ms │             1342.04ms │     no change │
│ QQuery 36    │   124.76ms │              124.59ms │     no change │
│ QQuery 37    │    57.87ms │               54.63ms │ +1.06x faster │
│ QQuery 38    │   123.75ms │              124.85ms │     no change │
│ QQuery 39    │   198.89ms │              202.61ms │     no change │
│ QQuery 40    │    48.95ms │               47.96ms │     no change │
│ QQuery 41    │    45.21ms │               46.70ms │     no change │
│ QQuery 42    │    37.69ms │               38.21ms │     no change │
└──────────────┴────────────┴───────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                    │ 57751.10ms │
│ Total Time (concat_in_repartition)   │ 58036.22ms │
│ Average Time (HEAD)                  │  1343.05ms │
│ Average Time (concat_in_repartition) │  1349.68ms │
│ Queries Faster                       │          5 │
│ Queries Slower                       │          3 │
│ Queries with No Change               │         35 │
└──────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃     HEAD ┃ concat_in_repartition ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 115.45ms │              121.28ms │ 1.05x slower │
│ QQuery 2     │  22.05ms │               23.17ms │ 1.05x slower │
│ QQuery 3     │  34.20ms │               41.50ms │ 1.21x slower │
│ QQuery 4     │  19.74ms │               21.28ms │ 1.08x slower │
│ QQuery 5     │  53.14ms │               60.16ms │ 1.13x slower │
│ QQuery 6     │  12.19ms │               12.16ms │    no change │
│ QQuery 7     │  95.32ms │              112.42ms │ 1.18x slower │
│ QQuery 8     │  25.53ms │               28.15ms │ 1.10x slower │
│ QQuery 9     │  60.59ms │               66.57ms │ 1.10x slower │
│ QQuery 10    │  58.20ms │               62.97ms │ 1.08x slower │
│ QQuery 11    │  11.57ms │               11.84ms │    no change │
│ QQuery 12    │  41.92ms │               44.55ms │ 1.06x slower │
│ QQuery 13    │  28.11ms │               29.19ms │    no change │
│ QQuery 14    │   9.74ms │               10.44ms │ 1.07x slower │
│ QQuery 15    │  22.92ms │               23.21ms │    no change │
│ QQuery 16    │  22.36ms │               21.96ms │    no change │
│ QQuery 17    │  95.73ms │               96.48ms │    no change │
│ QQuery 18    │ 207.56ms │              216.17ms │    no change │
│ QQuery 19    │  26.01ms │               25.88ms │    no change │
│ QQuery 20    │  33.88ms │               36.51ms │ 1.08x slower │
│ QQuery 21    │ 159.01ms │              168.68ms │ 1.06x slower │
│ QQuery 22    │  16.62ms │               16.87ms │    no change │
└──────────────┴──────────┴───────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                    │ 1171.81ms │
│ Total Time (concat_in_repartition)   │ 1251.45ms │
│ Average Time (HEAD)                  │   53.26ms │
│ Average Time (concat_in_repartition) │   56.88ms │
│ Queries Faster                       │         0 │
│ Queries Slower                       │        13 │
│ Queries with No Change               │         9 │
└──────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor Author

🤖: Benchmark completed

Details

Comparing HEAD and concat_in_repartition
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ concat_in_repartition ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  1913.26ms │             1919.02ms │    no change │
│ QQuery 1     │   722.29ms │              700.68ms │    no change │
│ QQuery 2     │  1500.38ms │             1467.99ms │    no change │
│ QQuery 3     │   696.92ms │              696.76ms │    no change │
│ QQuery 4     │  1495.37ms │             1489.28ms │    no change │
│ QQuery 5     │ 15779.55ms │            16668.91ms │ 1.06x slower │
│ QQuery 6     │  2116.95ms │             2044.90ms │    no change │
│ QQuery 7     │  2125.37ms │             2089.67ms │    no change │
│ QQuery 8     │   863.22ms │              867.06ms │    no change │
└──────────────┴────────────┴───────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                    │ 27213.31ms │
│ Total Time (concat_in_repartition)   │ 27944.28ms │
│ Average Time (HEAD)                  │  3023.70ms │
│ Average Time (concat_in_repartition) │  3104.92ms │
│ Queries Faster                       │          0 │
│ Queries Slower                       │          1 │
│ Queries with No Change               │          8 │
└──────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ concat_in_repartition ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │    15.86ms │               14.85ms │ +1.07x faster │
│ QQuery 1     │    33.79ms │               34.30ms │     no change │
│ QQuery 2     │    81.29ms │               82.32ms │     no change │
│ QQuery 3     │    97.67ms │               97.49ms │     no change │
│ QQuery 4     │   593.72ms │              576.81ms │     no change │
│ QQuery 5     │   872.11ms │              831.23ms │     no change │
│ QQuery 6     │    23.55ms │               21.66ms │ +1.09x faster │
│ QQuery 7     │    39.27ms │               36.84ms │ +1.07x faster │
│ QQuery 8     │   917.74ms │              907.24ms │     no change │
│ QQuery 9     │  1201.79ms │             1202.36ms │     no change │
│ QQuery 10    │   270.42ms │              262.42ms │     no change │
│ QQuery 11    │   303.32ms │              292.44ms │     no change │
│ QQuery 12    │   927.99ms │              923.22ms │     no change │
│ QQuery 13    │  1297.08ms │             1246.41ms │     no change │
│ QQuery 14    │   857.34ms │              870.62ms │     no change │
│ QQuery 15    │   833.84ms │              814.85ms │     no change │
│ QQuery 16    │  1782.59ms │             1747.95ms │     no change │
│ QQuery 17    │  1653.21ms │             1631.09ms │     no change │
│ QQuery 18    │  3134.19ms │             3105.45ms │     no change │
│ QQuery 19    │    84.39ms │               88.66ms │  1.05x slower │
│ QQuery 20    │  1190.98ms │             1141.01ms │     no change │
│ QQuery 21    │  1395.64ms │             1329.52ms │     no change │
│ QQuery 22    │  2330.04ms │             2243.38ms │     no change │
│ QQuery 23    │  8406.43ms │             8188.72ms │     no change │
│ QQuery 24    │   489.13ms │              479.48ms │     no change │
│ QQuery 25    │   429.30ms │              395.13ms │ +1.09x faster │
│ QQuery 26    │   559.01ms │              538.71ms │     no change │
│ QQuery 27    │  1690.54ms │             1647.99ms │     no change │
│ QQuery 28    │ 12583.01ms │            13495.15ms │  1.07x slower │
│ QQuery 29    │   516.65ms │              543.13ms │  1.05x slower │
│ QQuery 30    │   816.74ms │              816.88ms │     no change │
│ QQuery 31    │   860.30ms │              844.47ms │     no change │
│ QQuery 32    │  2704.29ms │             2735.08ms │     no change │
│ QQuery 33    │  3406.90ms │             3388.81ms │     no change │
│ QQuery 34    │  3420.93ms │             3478.97ms │     no change │
│ QQuery 35    │  1292.91ms │             1342.04ms │     no change │
│ QQuery 36    │   124.76ms │              124.59ms │     no change │
│ QQuery 37    │    57.87ms │               54.63ms │ +1.06x faster │
│ QQuery 38    │   123.75ms │              124.85ms │     no change │
│ QQuery 39    │   198.89ms │              202.61ms │     no change │
│ QQuery 40    │    48.95ms │               47.96ms │     no change │
│ QQuery 41    │    45.21ms │               46.70ms │     no change │
│ QQuery 42    │    37.69ms │               38.21ms │     no change │
└──────────────┴────────────┴───────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                    │ 57751.10ms │
│ Total Time (concat_in_repartition)   │ 58036.22ms │
│ Average Time (HEAD)                  │  1343.05ms │
│ Average Time (concat_in_repartition) │  1349.68ms │
│ Queries Faster                       │          5 │
│ Queries Slower                       │          3 │
│ Queries with No Change               │         35 │
└──────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃     HEAD ┃ concat_in_repartition ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 115.45ms │              121.28ms │ 1.05x slower │
│ QQuery 2     │  22.05ms │               23.17ms │ 1.05x slower │
│ QQuery 3     │  34.20ms │               41.50ms │ 1.21x slower │
│ QQuery 4     │  19.74ms │               21.28ms │ 1.08x slower │
│ QQuery 5     │  53.14ms │               60.16ms │ 1.13x slower │
│ QQuery 6     │  12.19ms │               12.16ms │    no change │
│ QQuery 7     │  95.32ms │              112.42ms │ 1.18x slower │
│ QQuery 8     │  25.53ms │               28.15ms │ 1.10x slower │
│ QQuery 9     │  60.59ms │               66.57ms │ 1.10x slower │
│ QQuery 10    │  58.20ms │               62.97ms │ 1.08x slower │
│ QQuery 11    │  11.57ms │               11.84ms │    no change │
│ QQuery 12    │  41.92ms │               44.55ms │ 1.06x slower │
│ QQuery 13    │  28.11ms │               29.19ms │    no change │
│ QQuery 14    │   9.74ms │               10.44ms │ 1.07x slower │
│ QQuery 15    │  22.92ms │               23.21ms │    no change │
│ QQuery 16    │  22.36ms │               21.96ms │    no change │
│ QQuery 17    │  95.73ms │               96.48ms │    no change │
│ QQuery 18    │ 207.56ms │              216.17ms │    no change │
│ QQuery 19    │  26.01ms │               25.88ms │    no change │
│ QQuery 20    │  33.88ms │               36.51ms │ 1.08x slower │
│ QQuery 21    │ 159.01ms │              168.68ms │ 1.06x slower │
│ QQuery 22    │  16.62ms │               16.87ms │    no change │
└──────────────┴──────────┴───────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                    ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                    │ 1171.81ms │
│ Total Time (concat_in_repartition)   │ 1251.45ms │
│ Average Time (HEAD)                  │   53.26ms │
│ Average Time (concat_in_repartition) │   56.88ms │
│ Queries Faster                       │         0 │
│ Queries Slower                       │        13 │
│ Queries with No Change               │         9 │
└──────────────────────────────────────┴───────────┘

hmm interesting. this shows something different.

@Dandandan
Copy link
Contributor Author

One commit was missing, but not sure that explains the difference between my result and this one.

@Dandandan
Copy link
Contributor Author

. let me try some other approach later - buffering inputs for each output partition until it reaches the target batch size (just like coalescebatches). perhaps the extra copy for smaller sized batches or increased size might be hurting in some cases.

@Dandandan Dandandan closed this Jun 1, 2025
@Dandandan
Copy link
Contributor Author

Dandandan commented Jun 2, 2025

I got some amazing results (5-20% on total average on benchmarks) on the latter approach yesterday (buffer inside repartition). Will clean it up later this week (currently ill).

@alamb
Copy link
Contributor

alamb commented Jun 2, 2025

I got some amazing results (5-20% on total average on benchmarks) on the latter approach yesterday (buffer inside repartition). Will clean it up later this week (currently ill).

I hope you feel better !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-plan Changes to the physical-plan crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants