Skip to content

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Dec 4, 2025

Which issue does this PR close?

  • Closes #NNN.

Rationale for this change

filter: primitive, 8192, nulls: 0, selectivity: 0.001
                        time:   [20.430 ms 20.678 ms 21.105 ms]
                        change: [−65.000% −64.516% −63.806%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

filter: primitive, 8192, nulls: 0, selectivity: 0.01
                        time:   [3.3275 ms 3.3451 ms 3.3665 ms]
                        change: [−49.062% −48.663% −48.260%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high severe

Benchmarking filter: primitive, 8192, nulls: 0, selectivity: 0.1: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.5s, enable flat sampling, or reduce sample count to 50.
filter: primitive, 8192, nulls: 0, selectivity: 0.1
                        time:   [1.4759 ms 1.4887 ms 1.5105 ms]
                        change: [−26.613% −23.553% −15.842%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe

Benchmarking filter: primitive, 8192, nulls: 0, selectivity: 0.8: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 60.
filter: primitive, 8192, nulls: 0, selectivity: 0.8
                        time:   [1.3569 ms 1.3626 ms 1.3702 ms]
                        change: [−47.225% −46.850% −46.451%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

filter: primitive, 8192, nulls: 0.1, selectivity: 0.001
                        time:   [23.231 ms 23.295 ms 23.376 ms]
                        change: [−69.694% −69.516% −69.351%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

filter: primitive, 8192, nulls: 0.1, selectivity: 0.01
                        time:   [5.4033 ms 5.4201 ms 5.4424 ms]
                        change: [−49.860% −49.590% −49.325%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

filter: primitive, 8192, nulls: 0.1, selectivity: 0.1
                        time:   [3.6111 ms 3.6270 ms 3.6475 ms]
                        change: [−27.778% −26.284% −25.286%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

filter: primitive, 8192, nulls: 0.1, selectivity: 0.8
                        time:   [3.6298 ms 3.7206 ms 3.8600 ms]
                        change: [−26.637% −24.714% −21.997%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Make filtered coalescing faster for primitive
@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 4, 2025
@Dandandan Dandandan changed the title Make filtered coalescing faster for primitive types Make push_batch_with_filter faster for primitive types Dec 4, 2025
@Dandandan Dandandan changed the title Make push_batch_with_filter faster for primitive types Make push_batch_with_filter faster for primitive types: up to 10x faster Dec 4, 2025
@Dandandan Dandandan changed the title Make push_batch_with_filter faster for primitive types: up to 10x faster Make push_batch_with_filter up to 10x faster for primitive types Dec 4, 2025
@Dandandan
Copy link
Contributor Author

@alamb you are probably interested in this

@alamb
Copy link
Contributor

alamb commented Dec 4, 2025

YAAAAASSS -- this is exactly the type of thing I was hoping for with BatchCoalescer. I will check this out shortly

let filtered_batch = filter_record_batch(&batch, filter)?;
self.push_batch(filtered_batch)
// We only support primitve now, fallback to filter_record_batch for other types
// Also, skip optimization when filter is not very selective
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if always better to take into account biggest_coalesce_batch_size

@alamb
Copy link
Contributor

alamb commented Dec 4, 2025

run benchmark filter_kernels

@alamb
Copy link
Contributor

alamb commented Dec 4, 2025

show benchmark queue

@alamb-ghbot
Copy link

🤖 Hi @alamb, you asked to view the benchmark queue (#8951 (comment)).

Job User Benchmarks Comment
arrow-8933-3613162300.sh alamb default https://github.com/apache/arrow-rs/pull/8933#issuecomment-3613162300
arrow-8933-3613131981.sh alamb filter_kernels https://github.com/apache/arrow-rs/pull/8933#issuecomment-3613131981
arrow-8951-3613212415.sh alamb filter_kernels https://github.com/apache/arrow-rs/pull/8951#issuecomment-3613212415

@Dandandan
Copy link
Contributor Author

Dandandan commented Dec 4, 2025

Hm it seems it contains a bug, probably makes the benchmark results off as well (will take a look tomorrow).

@Dandandan Dandandan marked this pull request as draft December 4, 2025 17:08
@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing coalesce_batches_filter (0872a9b) to ed9efe7 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=coalesce_batches_filter
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                         coalesce_batches_filter                main
-----                                                                         -----------------------                ----
filter context decimal128 (kept 1/2)                                          1.36     57.5±5.45µs        ? ?/sec    1.00     42.1±1.93µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     55.7±4.51µs        ? ?/sec    1.09     60.5±0.29µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.00    242.4±0.35ns        ? ?/sec    1.06    256.0±1.60ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.00     77.7±1.20µs        ? ?/sec    1.00     78.0±2.52µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.00      9.9±0.32µs        ? ?/sec    1.01     10.1±0.30µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.00    444.2±7.59ns        ? ?/sec    1.06   469.4±13.36ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.00     60.7±1.16µs        ? ?/sec    1.00     60.7±0.37µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.00     60.7±0.36µs        ? ?/sec    1.00     60.7±0.56µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.00     60.6±0.26µs        ? ?/sec    1.00     60.8±1.05µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.00     60.8±1.45µs        ? ?/sec    1.00     60.7±1.02µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.00     60.7±0.71µs        ? ?/sec    1.00     60.8±1.22µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.01     61.2±3.05µs        ? ?/sec    1.00     60.8±0.90µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.00     60.7±0.46µs        ? ?/sec    1.00     60.8±0.46µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.00     61.0±2.06µs        ? ?/sec    1.00     60.7±0.55µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.00     60.8±1.25µs        ? ?/sec    1.00     60.8±1.00µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.01     16.6±0.28µs        ? ?/sec    1.00     16.5±0.30µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.04      6.5±0.20µs        ? ?/sec    1.00      6.2±0.17µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.00    236.0±5.78ns        ? ?/sec    1.05    246.9±1.45ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     77.8±2.17µs        ? ?/sec    1.00     77.9±0.80µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.00     10.1±0.52µs        ? ?/sec    1.04     10.5±0.18µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.00    446.9±4.94ns        ? ?/sec    1.06    471.6±6.49ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.00    109.0±3.21µs        ? ?/sec    1.11    120.7±3.20µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.00     53.9±2.45µs        ? ?/sec    1.03     55.3±2.41µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.00   654.9±19.57ns        ? ?/sec    1.04   677.9±18.99ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    104.2±1.47µs        ? ?/sec    1.08    112.2±3.44µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.02     55.5±1.25µs        ? ?/sec    1.00     54.5±0.23µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.00    464.2±2.70ns        ? ?/sec    1.06    491.4±7.75ns        ? ?/sec
filter context string (kept 1/2)                                              1.03   599.4±17.30µs        ? ?/sec    1.00    582.1±5.14µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.00     17.0±0.13µs        ? ?/sec    1.02     17.3±0.27µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.00      7.0±0.34µs        ? ?/sec    1.02      7.2±0.27µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.02    847.3±9.58ns        ? ?/sec    1.00    829.8±3.84ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00     78.8±1.05µs        ? ?/sec    1.00     78.9±2.34µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00     10.7±0.41µs        ? ?/sec    1.01     10.8±0.35µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.01  1076.9±14.42ns        ? ?/sec    1.00  1067.4±30.14ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.00   703.0±13.80µs        ? ?/sec    1.00   703.8±19.93µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.00  1016.7±52.17ns        ? ?/sec    1.02  1036.2±34.58ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     14.9±0.05µs        ? ?/sec    1.00     15.0±0.14µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.00  1829.3±23.69ns        ? ?/sec    1.11      2.0±0.01µs        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.00    231.0±5.30ns        ? ?/sec    1.03    238.8±0.83ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.00     75.9±0.20µs        ? ?/sec    1.00     76.1±0.78µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.00      5.1±0.08µs        ? ?/sec    1.05      5.4±0.06µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.00   441.3±12.39ns        ? ?/sec    1.06    467.4±2.32ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.00     49.5±0.83µs        ? ?/sec    1.18     58.6±2.81µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.17     61.3±2.70µs        ? ?/sec    1.00     52.6±1.25µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.00      2.9±0.09µs        ? ?/sec    1.13      3.2±0.08µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.07    166.6±7.99µs        ? ?/sec    1.00    156.4±2.84µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.12    141.6±1.19µs        ? ?/sec    1.00    126.0±3.73µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.11     76.6±1.07µs        ? ?/sec    1.00     68.7±1.04µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.00      2.7±0.09µs        ? ?/sec    1.29      3.5±0.10µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.17    141.8±2.30µs        ? ?/sec    1.00    121.1±0.87µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.00     10.8±0.16µs        ? ?/sec    1.05     11.3±0.33µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.00      2.6±0.08µs        ? ?/sec    1.28      3.3±0.02µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.05    189.3±7.05µs        ? ?/sec    1.00    181.1±9.22µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.00    255.5±8.77µs        ? ?/sec    1.03    264.3±6.26µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.00      2.6±0.03µs        ? ?/sec    1.27      3.3±0.10µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.25     53.8±0.68µs        ? ?/sec    1.00     43.2±0.31µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.05      8.9±0.48µs        ? ?/sec    1.00      8.4±0.32µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.24      2.9±0.06µs        ? ?/sec    1.00      2.4±0.03µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.01     54.8±2.99µs        ? ?/sec    1.00     54.5±1.51µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.18      3.1±0.14µs        ? ?/sec    1.00      2.6±0.01µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.00      2.7±0.00µs        ? ?/sec    1.00      2.7±0.02µs        ? ?/sec
filter run array (kept 1/2)                                                   1.03   436.4±17.42µs        ? ?/sec    1.00    422.5±4.27µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.01    452.6±7.45µs        ? ?/sec    1.00   449.3±12.94µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.01   336.4±10.57µs        ? ?/sec    1.00    334.5±2.82µs        ? ?/sec
filter single record batch                                                    1.23     54.3±2.92µs        ? ?/sec    1.00     44.2±0.07µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.00     45.5±0.99µs        ? ?/sec    1.00     45.7±0.44µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.05      4.0±0.11µs        ? ?/sec    1.00      3.8±0.04µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00      3.0±0.05µs        ? ?/sec    1.12      3.3±0.11µs        ? ?/sec

@Dandandan
Copy link
Contributor Author

run benchmark coalesce_kernels

@alamb-ghbot
Copy link

🤖 Hi @Dandandan, thanks for the request (#8951 (comment)).

scrape_comments.py only supports whitelisted benchmarks.

  • Standard: (none)
  • Criterion: arrow_reader, concatenate_kernels, filter_kernels

Please choose one or more of these with run benchmark <name> or run benchmark <name1> <name2>...
Unsupported benchmarks: coalesce_kernels.

@Dandandan Dandandan changed the title Make push_batch_with_filter up to 10x faster for primitive types Make push_batch_with_filter up to 2x faster for primitive types Dec 4, 2025
@Dandandan Dandandan changed the title Make push_batch_with_filter up to 2x faster for primitive types Make push_batch_with_filter up to 3x faster for primitive types Dec 4, 2025
@Dandandan
Copy link
Contributor Author

@alamb I think it's ok now - I called AI (Opus 4.5) for some help on the find_nth_set_bit_position function.

Mainly needs some polish and seeing if we can improve the filter: primitive, 8192, nulls: 0.1, selectivity: 0.8 case.

@alamb
Copy link
Contributor

alamb commented Dec 5, 2025

run benchmark coalesce_kernels

I added this to the allowed benchmarks

@alamb
Copy link
Contributor

alamb commented Dec 5, 2025

run benchmark coalesce_kernels

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing coalesce_batches_filter (dcf4864) to ed9efe7 diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=coalesce_batches_filter
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                                                                coalesce_batches_filter                main
-----                                                                                -----------------------                ----
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001                               1.01    261.9±3.23ms        ? ?/sec    1.00    259.4±2.06ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01                                1.00      8.6±0.14ms        ? ?/sec    1.01      8.7±0.10ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1                                 1.00      4.1±0.06ms        ? ?/sec    1.01      4.1±0.09ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8                                 1.00      3.5±0.01ms        ? ?/sec    1.02      3.5±0.02ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001                             1.00    245.6±2.39ms        ? ?/sec    1.27    312.5±3.08ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01                              1.01      9.4±0.09ms        ? ?/sec    1.00      9.4±0.07ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1                               1.00      4.5±0.08ms        ? ?/sec    1.02      4.6±0.08ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8                               1.00      4.6±0.03ms        ? ?/sec    1.01      4.6±0.02ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001                               1.01     59.6±1.58ms        ? ?/sec    1.00     59.2±0.34ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01                                1.00     11.6±0.18ms        ? ?/sec    1.00     11.6±0.18ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1                                 1.01      9.3±0.18ms        ? ?/sec    1.00      9.2±0.09ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8                                 1.00      8.2±0.22ms        ? ?/sec    1.28     10.4±0.24ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001                             1.01     70.3±0.25ms        ? ?/sec    1.00     69.9±0.25ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01                              1.01     12.9±0.14ms        ? ?/sec    1.00     12.8±0.06ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1                               1.00      9.8±0.05ms        ? ?/sec    1.06     10.4±0.16ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8                               1.00     10.0±0.25ms        ? ?/sec    1.02     10.1±0.20ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001      1.05     50.7±0.30ms        ? ?/sec    1.00     48.1±0.17ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01       1.03      6.2±0.06ms        ? ?/sec    1.00      6.0±0.05ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1        1.00      4.5±0.11ms        ? ?/sec    1.00      4.5±0.15ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8        1.02      3.1±0.03ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.001    1.04     60.3±0.24ms        ? ?/sec    1.00     58.1±0.25ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01     1.03      8.2±0.03ms        ? ?/sec    1.00      7.9±0.03ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1      1.00      5.6±0.13ms        ? ?/sec    1.07      6.0±0.11ms        ? ?/sec
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8      1.00      3.9±0.02ms        ? ?/sec    1.01      3.9±0.01ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001       1.03     43.5±0.56ms        ? ?/sec    1.00     42.5±0.09ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01        1.05      4.9±0.22ms        ? ?/sec    1.00      4.7±0.01ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1         1.03      2.4±0.05ms        ? ?/sec    1.00      2.3±0.04ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8         1.00  1466.4±10.31µs        ? ?/sec    1.05   1537.3±9.34µs        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.001     1.02     53.3±0.16ms        ? ?/sec    1.00     52.1±0.13ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.01      1.01      7.2±0.03ms        ? ?/sec    1.00      7.1±0.02ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.1       1.00      3.7±0.03ms        ? ?/sec    1.06      3.9±0.07ms        ? ?/sec
filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0.1, selectivity: 0.8       1.01      3.9±0.02ms        ? ?/sec    1.00      3.9±0.01ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.001                                1.00     54.1±1.62ms        ? ?/sec    1.80     97.2±0.21ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.01                                 1.00      5.9±0.03ms        ? ?/sec    1.57      9.3±0.02ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.1                                  1.00      3.2±0.09ms        ? ?/sec    1.17      3.7±0.05ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.8                                  1.00      2.7±0.01ms        ? ?/sec    1.14      3.1±0.02ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001                              1.00     60.9±0.09ms        ? ?/sec    2.06    125.4±0.26ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.01                               1.00     10.9±0.04ms        ? ?/sec    1.38     15.1±0.06ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.1                                1.16      8.6±0.20ms        ? ?/sec    1.00      7.4±0.36ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.8                                1.62     14.7±0.04ms        ? ?/sec    1.00      9.1±0.04ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001                          1.04     68.2±0.48ms        ? ?/sec    1.00     65.7±1.26ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01                           1.09      7.9±0.04ms        ? ?/sec    1.00      7.3±0.02ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1                            1.00      3.6±0.17ms        ? ?/sec    1.08      3.9±0.21ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8                            1.00   1400.0±6.27µs        ? ?/sec    1.02   1421.7±6.47µs        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001                        1.10     92.0±0.23ms        ? ?/sec    1.00     83.6±0.13ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01                         1.06     11.6±0.05ms        ? ?/sec    1.00     11.0±0.05ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1                          1.00      5.1±0.08ms        ? ?/sec    1.10      5.7±0.33ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8                          1.00      3.8±0.01ms        ? ?/sec    1.01      3.8±0.01ms        ? ?/sec

@Dandandan
Copy link
Contributor Author

filter: primitive, 8192, nulls: 0, selectivity: 0.001                                1.00     54.1±1.62ms        ? ?/sec    1.80     97.2±0.21ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.01                                 1.00      5.9±0.03ms        ? ?/sec    1.57      9.3±0.02ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.1                                  1.00      3.2±0.09ms        ? ?/sec    1.17      3.7±0.05ms        ? ?/sec
filter: primitive, 8192, nulls: 0, selectivity: 0.8                                  1.00      2.7±0.01ms        ? ?/sec    1.14      3.1±0.02ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001                              1.00     60.9±0.09ms        ? ?/sec    2.06    125.4±0.26ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.01                               1.00     10.9±0.04ms        ? ?/sec    1.38     15.1±0.06ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.1                                1.16      8.6±0.20ms        ? ?/sec    1.00      7.4±0.36ms        ? ?/sec
filter: primitive, 8192, nulls: 0.1, selectivity: 0.8                                1.62     14.7±0.04ms        ? ?/sec    1.00      9.1±0.04ms        ? ?/sec

Pretty good I would say... probably have to look a bit more at the null-handling speed

@alamb
Copy link
Contributor

alamb commented Dec 5, 2025

Pretty good I would say... probably have to look a bit more at the null-handling speed

I feel there is a bunch of null handling performance to be had via work in

I'll try and review this PR more carefully later today

@Dandandan
Copy link
Contributor Author

run benchmark coalesce_kernels filter_kernels

@Dandandan
Copy link
Contributor Author

@alamb seems we can play with the filter threshold value, probably a value with >=0.9 will give nice speedups, we might even go further based on some benchmark results

@Dandandan
Copy link
Contributor Author

run benchmark coalesce_kernels

@Dandandan
Copy link
Contributor Author

It is now faster in all cases on my machine 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants