Skip to content

Conversation

@rluvaton
Copy link
Member

@rluvaton rluvaton commented Oct 19, 2025

Waiting for the PRs below to be merged first:

This PR include the following other PRs (unless merged) to make the review easier, so please make sure to review them first

Which issue does this PR close?

N/A

Rationale for this change

Making zip really fast for scalars

This is useful for IF <expr> THEN <literal> ELSE <literal> END

What changes are included in this PR?

Created couple of implementation for zipping scalar, for primitive, bytes and fallback

Are these changes tested?

existing tests

Are there any user-facing changes?

new struct ScalarZipper

TODO:

  • Need to add comments if missing
  • Add tests for decimal and timestamp to make sure the type is kept

This is useful for `IF <expr> THEN <scalar> ELSE <scalar> END`

TODO:
- [ ] Need to add comments if missing
- [ ] Add benchmark
@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 19, 2025
Comment on lines 355 to 358
let scalars: Vec<T::Native> = predicate
.iter()
.map(|b| if b { then_val } else { else_val })
.collect();
Copy link
Member Author

@rluvaton rluvaton Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably use conditional move

Comment on lines +534 to +543
fn combine_nulls_and_false(predicate: &BooleanArray) -> BooleanBuffer {
if let Some(nulls) = predicate.nulls().filter(|n| n.null_count() > 0) {
predicate.values().bitand(
// nulls are represented as 0 (false) in the values buffer
nulls.inner(),
)
} else {
predicate.values().clone()
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure there is already a helper function in arrow for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alamb pushed a commit that referenced this pull request Oct 19, 2025
# Which issue does this PR close?

N/A

# Rationale for this change

I have a PR to improve zip perf for scalar but I don't see any
benchmarks for it:
- #8653 

# What changes are included in this PR?

created zip benchmarks for scalar and non scalar with different masks 

# Are these changes tested?
N/A

# Are there any user-facing changes?

Nope
@rluvaton
Copy link
Member Author

@alamb If you wanna run the benchmarks for zip, there are no more optimization left for this PR, only cleanups, tests and comments

I saw for scalars major improvements while in array and scalar regression for some reason (maybe the extra check? even though it is a simple comparison. I run it on bare metal to reduce noise as much as possible)

I tests it on:

$ neofetch
            .-/+oossssoo+/-.               ubuntu@ip-
        `:+ssssssssssssssssss+:`           -----------------------
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 24.04.3 LTS x86_64
    .ossssssssssssssssssdMMMNysssso.       Host: c5.metal 1.0
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 6.14.0-1011-aws
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 3 hours, 46 mins
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 921 (dpkg), 5 (snap)
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: bash 5.2.21
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Terminal: /dev/pts/0
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   CPU: Intel Xeon Platinum 8275CL (96) @ 3.900GHz
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Memory: 2144MiB / 193025MiB
+sssshhhyNMMNyssssssssssssyNMMMysssssss+
.ssssssssdMMMNhsssssssssshNMMMdssssssss.
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
  +sssssssssdmydMMMMMMMMddddyssssssss+
   /ssssssssssshdmNNNNmyNMMMMhssssss/
    .ossssssssssssssssssdMMMNysssso.
      -+sssssssssssssssssyyyssss+-
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.

@alamb
Copy link
Contributor

alamb commented Oct 20, 2025

Thank you @rluvaton -- I have scheduled benchmarks for this PR and reviewed the dependent ones. Exciting stuff

@alamb
Copy link
Contributor

alamb commented Oct 20, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing improve-zip-for-scalars (dabbf55) to 9d75f87 diff
BENCH_NAME=zip_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench zip_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=improve-zip-for-scalars
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Oct 20, 2025

The zip benchmarks are still running... Maybe we should trim them back a bit

@alamb
Copy link
Contributor

alamb commented Oct 20, 2025

🤖: Benchmark completed

Details

group                                                                               improve-zip-for-scalars                main
-----                                                                               -----------------------                ----
zip_8192_from_i32/array_vs_array/10pct_true                                         1.02     35.5±0.18µs        ? ?/sec    1.00     34.7±0.05µs        ? ?/sec
zip_8192_from_i32/array_vs_array/1pct_true                                          1.00      5.1±0.01µs        ? ?/sec    1.00      5.1±0.02µs        ? ?/sec
zip_8192_from_i32/array_vs_array/50pct_nulls                                        1.02     75.3±0.17µs        ? ?/sec    1.00     74.0±0.16µs        ? ?/sec
zip_8192_from_i32/array_vs_array/50pct_true                                         1.01    102.6±0.22µs        ? ?/sec    1.00    101.7±0.17µs        ? ?/sec
zip_8192_from_i32/array_vs_array/90pct_true                                         1.01     36.5±0.07µs        ? ?/sec    1.00     36.1±0.08µs        ? ?/sec
zip_8192_from_i32/array_vs_array/99pct_true                                         1.00      5.9±0.03µs        ? ?/sec    1.01      6.0±0.02µs        ? ?/sec
zip_8192_from_i32/array_vs_array/all_false                                          1.01      2.5±0.09µs        ? ?/sec    1.00      2.5±0.11µs        ? ?/sec
zip_8192_from_i32/array_vs_array/all_true                                           1.03      2.5±0.11µs        ? ?/sec    1.00      2.5±0.09µs        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/10pct_true                               1.01     32.5±0.10ns        ? ?/sec    1.00     32.2±0.15ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/1pct_true                                1.05     33.8±0.09ns        ? ?/sec    1.00     32.2±0.17ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/50pct_nulls                              1.05     33.8±0.10ns        ? ?/sec    1.00     32.2±0.22ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/50pct_true                               1.01     32.6±0.08ns        ? ?/sec    1.00     32.2±0.07ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/90pct_true                               1.03     33.1±0.10ns        ? ?/sec    1.00     32.2±0.26ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/99pct_true                               1.01     32.6±0.17ns        ? ?/sec    1.00     32.2±0.09ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/all_false                                1.01     32.5±0.06ns        ? ?/sec    1.00     32.2±0.16ns        ? ?/sec
zip_8192_from_i32/array_vs_non_null_scalar/all_true                                 1.01     32.5±0.05ns        ? ?/sec    1.00     32.2±0.05ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/10pct_true                               1.01     32.5±0.07ns        ? ?/sec    1.00     32.2±0.09ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/1pct_true                                1.05     33.8±0.07ns        ? ?/sec    1.00     32.2±0.12ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/50pct_nulls                              1.05     33.8±0.43ns        ? ?/sec    1.00     32.2±0.21ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/50pct_true                               1.01     32.6±0.05ns        ? ?/sec    1.00     32.1±0.08ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/90pct_true                               1.03     33.1±0.07ns        ? ?/sec    1.00     32.2±0.14ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/99pct_true                               1.01     32.6±0.08ns        ? ?/sec    1.00     32.2±0.08ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/all_false                                1.01     32.6±0.07ns        ? ?/sec    1.00     32.2±0.09ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_array/all_true                                 1.01     32.6±0.10ns        ? ?/sec    1.00     32.2±0.08ns        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/10pct_true                         1.00  1185.4±15.73ns        ? ?/sec    116.20   137.7±0.26µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/1pct_true                          1.00   1187.9±3.93ns        ? ?/sec    111.72   132.7±0.21µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/50pct_nulls                        1.00   1277.8±9.95ns        ? ?/sec    112.95   144.3±0.29µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/50pct_true                         1.00  1163.2±16.93ns        ? ?/sec    130.07   151.3±0.43µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/90pct_true                         1.00  1150.1±12.10ns        ? ?/sec    89.00   102.4±0.43µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/99pct_true                         1.00   1185.2±4.06ns        ? ?/sec    74.24    88.0±0.43µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/all_false                          1.00   1165.8±4.12ns        ? ?/sec    115.27   134.4±5.33µs        ? ?/sec
zip_8192_from_i32/non_null_scalar_vs_null_scalar/all_true                           1.00   1168.4±9.30ns        ? ?/sec    73.62    86.0±0.19µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/10pct_true                                      1.00      9.1±0.01µs        ? ?/sec    7.43     67.7±0.37µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/1pct_true                                       1.00      9.0±0.05µs        ? ?/sec    6.54     59.1±0.18µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/50pct_nulls                                     1.00      9.1±0.03µs        ? ?/sec    8.83     80.6±0.13µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/50pct_true                                      1.00      9.1±0.01µs        ? ?/sec    10.68    96.8±0.32µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/90pct_true                                      1.00      9.1±0.07µs        ? ?/sec    7.61     69.6±0.15µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/99pct_true                                      1.00      9.1±0.07µs        ? ?/sec    6.55     59.8±0.13µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/all_false                                       1.00      9.1±0.03µs        ? ?/sec    6.57     59.9±0.33µs        ? ?/sec
zip_8192_from_i32/non_nulls_scalars/all_true                                        1.00      9.1±0.07µs        ? ?/sec    6.40     58.1±0.75µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/10pct_true                                1.00   1190.3±4.64ns        ? ?/sec    84.69   100.8±0.14µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/1pct_true                                 1.00   1350.6±5.18ns        ? ?/sec    64.97    87.7±0.16µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/50pct_nulls                               1.00  1425.8±13.40ns        ? ?/sec    86.43   123.2±0.21µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/50pct_true                                1.00  1269.4±13.18ns        ? ?/sec    119.66   151.9±0.30µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/90pct_true                                1.00   1339.9±7.01ns        ? ?/sec    104.26   139.7±0.16µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/99pct_true                                1.00   1277.2±3.57ns        ? ?/sec    104.62   133.6±0.26µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/all_false                                 1.00   1331.3±7.25ns        ? ?/sec    64.88    86.4±0.19µs        ? ?/sec
zip_8192_from_i32/null_vs_non_null_scalar/all_true                                  1.00   1243.0±2.26ns        ? ?/sec    106.56   132.5±0.35µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/10pct_true                       1.00   319.4±10.44µs        ? ?/sec    1.04   332.7±12.15µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/1pct_true                        1.00    288.4±4.29µs        ? ?/sec    1.00    288.1±5.35µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/50pct_nulls                      1.00   388.7±14.23µs        ? ?/sec    1.00    386.9±4.64µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/50pct_true                       1.00    426.8±9.92µs        ? ?/sec    1.00   426.6±11.04µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/90pct_true                       1.00    327.8±8.85µs        ? ?/sec    1.01    331.4±5.05µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/99pct_true                       1.04    279.8±6.41µs        ? ?/sec    1.00    269.2±4.80µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/all_false                        1.05    118.5±4.92µs        ? ?/sec    1.00    112.7±5.53µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_array/all_true                         1.00    117.6±2.97µs        ? ?/sec    1.00    117.8±4.67µs        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/10pct_true             1.01     32.8±0.24ns        ? ?/sec    1.00     32.6±0.04ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/1pct_true              1.01     32.8±0.04ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/50pct_nulls            1.01     32.8±0.05ns        ? ?/sec    1.00     32.5±0.34ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/50pct_true             1.01     32.8±0.27ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/90pct_true             1.01     32.8±0.28ns        ? ?/sec    1.00     32.5±0.05ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/99pct_true             1.00     32.8±0.06ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/all_false              1.00     32.8±0.05ns        ? ?/sec    1.00     32.7±0.38ns        ? ?/sec
zip_8192_from_long bytes (100..400)/array_vs_non_null_scalar/all_true               1.01     32.8±0.15ns        ? ?/sec    1.00     32.6±0.09ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/10pct_true             1.00     32.8±0.04ns        ? ?/sec    1.00     32.7±0.31ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/1pct_true              1.01     32.8±0.06ns        ? ?/sec    1.00     32.6±0.15ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/50pct_nulls            1.01     32.8±0.05ns        ? ?/sec    1.00     32.5±0.19ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/50pct_true             1.01     32.8±0.05ns        ? ?/sec    1.00     32.6±0.10ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/90pct_true             1.01     32.8±0.09ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/99pct_true             1.01     32.8±0.07ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/all_false              1.00     32.8±0.05ns        ? ?/sec    1.00     32.6±0.19ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_array/all_true               1.01     32.8±0.07ns        ? ?/sec    1.00     32.6±0.10ns        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/10pct_true       1.00     20.7±0.54µs        ? ?/sec    9.96    206.0±1.98µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/1pct_true        1.00     10.9±0.11µs        ? ?/sec    17.43   189.2±0.43µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/50pct_nulls      1.00     38.8±1.07µs        ? ?/sec    5.98    231.9±0.91µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/50pct_true       1.00     66.4±1.40µs        ? ?/sec    4.58    303.7±2.12µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/90pct_true       1.00     83.5±1.32µs        ? ?/sec    4.64    387.8±4.60µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/99pct_true       1.00     88.4±1.11µs        ? ?/sec    4.32    382.2±4.20µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/all_false        1.00  968.9±108.55ns        ? ?/sec    194.29   188.3±0.41µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_null_scalar_vs_null_scalar/all_true         1.00     89.4±1.74µs        ? ?/sec    4.27    381.5±5.05µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/10pct_true                    1.00     75.7±1.62µs        ? ?/sec    3.33    252.0±1.89µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/1pct_true                     1.00     57.5±0.55µs        ? ?/sec    4.31    247.9±2.23µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/50pct_nulls                   1.00     91.1±0.71µs        ? ?/sec    3.09    281.7±1.28µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/50pct_true                    1.00    103.3±1.11µs        ? ?/sec    3.91    404.1±2.26µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/90pct_true                    1.00     99.4±1.60µs        ? ?/sec    3.61    358.6±4.68µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/99pct_true                    1.00     87.7±1.20µs        ? ?/sec    4.03    353.3±2.34µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/all_false                     1.00     45.1±0.69µs        ? ?/sec    5.46    246.1±2.47µs        ? ?/sec
zip_8192_from_long bytes (100..400)/non_nulls_scalars/all_true                      1.00     79.1±0.58µs        ? ?/sec    4.47    353.2±2.26µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/10pct_true              1.00     83.8±1.22µs        ? ?/sec    4.60    385.3±4.00µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/1pct_true               1.00     88.2±0.74µs        ? ?/sec    4.35    383.1±4.88µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/50pct_nulls             1.00     74.8±1.70µs        ? ?/sec    5.21    389.8±2.27µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/50pct_true              1.00     66.2±1.09µs        ? ?/sec    4.61    305.4±2.39µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/90pct_true              1.00     21.1±0.52µs        ? ?/sec    9.81    207.1±2.43µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/99pct_true              1.00     11.4±0.04µs        ? ?/sec    16.75   190.5±0.31µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/all_false               1.00     89.1±0.83µs        ? ?/sec    4.26    379.9±3.87µs        ? ?/sec
zip_8192_from_long bytes (100..400)/null_vs_non_null_scalar/all_true                1.00  1143.5±51.10ns        ? ?/sec    163.04   186.4±0.89µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/10pct_true                     1.06   333.4±10.41µs        ? ?/sec    1.00    314.0±4.35µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/1pct_true                      1.04    301.5±6.53µs        ? ?/sec    1.00    290.5±5.29µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/50pct_nulls                    1.02    390.4±7.82µs        ? ?/sec    1.00    383.4±8.58µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/50pct_true                     1.03   433.6±15.06µs        ? ?/sec    1.00    421.5±4.80µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/90pct_true                     1.06    339.4±9.25µs        ? ?/sec    1.00    321.4±6.00µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/99pct_true                     1.03    265.6±6.99µs        ? ?/sec    1.00    259.1±2.96µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/all_false                      1.07    117.0±4.54µs        ? ?/sec    1.00    109.6±2.65µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_array/all_true                       1.00    116.0±4.46µs        ? ?/sec    1.01    117.0±5.47µs        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/10pct_true           1.00     32.7±0.05ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/1pct_true            1.00     32.8±0.06ns        ? ?/sec    1.00     32.7±0.27ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/50pct_nulls          1.01     32.8±0.13ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/50pct_true           1.00     32.7±0.06ns        ? ?/sec    1.00     32.6±0.10ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/90pct_true           1.00     32.8±0.05ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/99pct_true           1.00     32.7±0.04ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/all_false            1.01     32.7±0.05ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long strings (100..400)/array_vs_non_null_scalar/all_true             1.00     32.8±0.06ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/10pct_true           1.01     32.8±0.14ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/1pct_true            1.01     32.8±0.16ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/50pct_nulls          1.01     32.8±0.14ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/50pct_true           1.01     32.8±0.16ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/90pct_true           1.01     32.8±0.14ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/99pct_true           1.01     32.8±0.06ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/all_false            1.01     32.8±0.13ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_array/all_true             1.01     32.8±0.08ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/10pct_true     1.00     20.3±0.68µs        ? ?/sec    10.14   205.9±0.76µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/1pct_true      1.00     10.8±0.04µs        ? ?/sec    17.48   189.2±0.42µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/50pct_nulls    1.00     38.4±1.00µs        ? ?/sec    6.06    232.7±0.70µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/50pct_true     1.00     66.1±1.11µs        ? ?/sec    4.56    301.4±1.40µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/90pct_true     1.00     79.3±0.73µs        ? ?/sec    4.78    378.7±1.96µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/99pct_true     1.00     85.1±0.52µs        ? ?/sec    4.33    368.0±2.16µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/all_false      1.00   964.8±82.65ns        ? ?/sec    194.95   188.1±0.37µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_null_scalar_vs_null_scalar/all_true       1.00     87.3±0.78µs        ? ?/sec    4.22    368.4±1.56µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/10pct_true                  1.00    126.7±3.44µs        ? ?/sec    3.22    408.3±3.80µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/1pct_true                   1.00    145.2±2.32µs        ? ?/sec    2.80    406.0±3.63µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/50pct_nulls                 1.00    126.7±2.43µs        ? ?/sec    3.24    411.1±3.04µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/50pct_true                  1.00    125.8±2.96µs        ? ?/sec    3.34    420.3±3.56µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/90pct_true                  1.00    104.6±2.45µs        ? ?/sec    3.32    347.5±1.90µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/99pct_true                  1.00     89.3±1.55µs        ? ?/sec    3.78    337.8±2.64µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/all_false                   1.00    139.0±1.49µs        ? ?/sec    2.91    403.8±3.31µs        ? ?/sec
zip_8192_from_long strings (100..400)/non_nulls_scalars/all_true                    1.00     77.5±1.36µs        ? ?/sec    4.32    335.3±2.33µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/10pct_true            1.00     81.0±1.02µs        ? ?/sec    4.63    374.9±1.73µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/1pct_true             1.00     86.4±0.63µs        ? ?/sec    4.31    372.1±5.53µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/50pct_nulls           1.00     73.8±1.47µs        ? ?/sec    5.22    385.7±1.84µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/50pct_true            1.00     66.0±1.08µs        ? ?/sec    4.59    302.7±1.36µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/90pct_true            1.00     20.9±0.59µs        ? ?/sec    9.90    206.6±0.61µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/99pct_true            1.00     11.3±0.05µs        ? ?/sec    16.79   190.0±0.33µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/all_false             1.00     87.2±0.87µs        ? ?/sec    4.22    368.3±2.04µs        ? ?/sec
zip_8192_from_long strings (100..400)/null_vs_non_null_scalar/all_true              1.00  1179.7±53.33ns        ? ?/sec    157.91   186.3±0.22µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/10pct_true                         1.00     63.1±0.17µs        ? ?/sec    1.00     63.4±0.32µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/1pct_true                          1.00     22.4±0.11µs        ? ?/sec    1.00     22.4±0.15µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/50pct_nulls                        1.00    120.9±0.27µs        ? ?/sec    1.00    121.0±0.28µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/50pct_true                         1.00    158.6±0.51µs        ? ?/sec    1.00    159.0±0.34µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/90pct_true                         1.00     64.7±0.47µs        ? ?/sec    1.01     65.4±0.31µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/99pct_true                         1.00     23.2±0.20µs        ? ?/sec    1.00     23.3±0.16µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/all_false                          1.00     17.9±0.16µs        ? ?/sec    1.00     17.9±0.13µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_array/all_true                           1.00     17.7±0.19µs        ? ?/sec    1.01     17.9±0.16µs        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/10pct_true               1.01     32.8±0.18ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/1pct_true                1.01     32.8±0.08ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/50pct_nulls              1.01     32.8±0.06ns        ? ?/sec    1.00     32.6±0.09ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/50pct_true               1.01     32.8±0.05ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/90pct_true               1.00     32.7±0.05ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/99pct_true               1.00     32.8±0.05ns        ? ?/sec    1.00     32.6±0.10ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/all_false                1.01     32.8±0.08ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short bytes (3..10)/array_vs_non_null_scalar/all_true                 1.01     32.7±0.05ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/10pct_true               1.00     32.8±0.07ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/1pct_true                1.00     32.7±0.05ns        ? ?/sec    1.00     32.7±0.19ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/50pct_nulls              1.00     32.8±0.07ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/50pct_true               1.04     34.0±0.05ns        ? ?/sec    1.00     32.6±0.11ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/90pct_true               1.01     32.8±0.18ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/99pct_true               1.01     32.8±0.07ns        ? ?/sec    1.00     32.6±0.10ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/all_false                1.01     32.8±0.26ns        ? ?/sec    1.00     32.6±0.18ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_array/all_true                 1.00     32.8±0.04ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/10pct_true         1.00     15.7±0.05µs        ? ?/sec    12.42   194.7±0.46µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/1pct_true          1.00     10.4±0.02µs        ? ?/sec    18.16   188.5±0.33µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/50pct_nulls        1.00     26.1±0.30µs        ? ?/sec    7.73    201.9±0.58µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/50pct_true         1.00     38.5±0.26µs        ? ?/sec    5.47    210.6±0.50µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/90pct_true         1.00     18.6±0.09µs        ? ?/sec    9.06    168.7±0.26µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/99pct_true         1.00     13.3±0.09µs        ? ?/sec    11.65   155.1±0.38µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/all_false          1.00   912.4±65.08ns        ? ?/sec    205.07   187.1±0.35µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_null_scalar_vs_null_scalar/all_true           1.00     12.7±0.10µs        ? ?/sec    11.98   152.3±0.30µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/10pct_true                      1.00     34.4±0.06µs        ? ?/sec    3.81    131.0±0.28µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/1pct_true                       1.00     15.1±0.03µs        ? ?/sec    8.16    123.1±0.33µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/50pct_nulls                     1.00     57.9±0.24µs        ? ?/sec    2.48    143.5±0.48µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/50pct_true                      1.00     68.9±0.16µs        ? ?/sec    2.36    163.0±0.31µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/90pct_true                      1.00     33.2±0.05µs        ? ?/sec    4.00    132.8±0.23µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/99pct_true                      1.00     16.0±0.03µs        ? ?/sec    7.64    122.3±0.34µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/all_false                       1.00      2.6±0.03µs        ? ?/sec    47.25   122.4±0.40µs        ? ?/sec
zip_8192_from_short bytes (3..10)/non_nulls_scalars/all_true                        1.00      3.1±0.08µs        ? ?/sec    38.23   119.7±0.31µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/10pct_true                1.00     18.3±0.09µs        ? ?/sec    9.11    166.9±0.32µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/1pct_true                 1.00     13.3±0.10µs        ? ?/sec    11.61   154.7±0.35µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/50pct_nulls               1.00     27.5±0.35µs        ? ?/sec    6.78    186.2±2.63µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/50pct_true                1.00     38.6±0.27µs        ? ?/sec    5.45    210.3±0.39µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/90pct_true                1.00     16.2±0.03µs        ? ?/sec    12.00   194.9±0.41µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/99pct_true                1.00     10.8±0.04µs        ? ?/sec    17.43   187.9±0.51µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/all_false                 1.00     12.8±0.11µs        ? ?/sec    11.90   152.9±0.42µs        ? ?/sec
zip_8192_from_short bytes (3..10)/null_vs_non_null_scalar/all_true                  1.00  1249.4±82.65ns        ? ?/sec    149.12   186.3±0.49µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/10pct_true                       1.00     62.4±0.34µs        ? ?/sec    1.01     62.8±0.19µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/1pct_true                        1.01     22.4±0.14µs        ? ?/sec    1.00     22.2±0.18µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/50pct_nulls                      1.00    120.9±0.34µs        ? ?/sec    1.00    121.2±0.36µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/50pct_true                       1.00    158.5±0.60µs        ? ?/sec    1.00    159.2±0.34µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/90pct_true                       1.00     64.4±0.47µs        ? ?/sec    1.01     64.7±0.38µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/99pct_true                       1.00     23.1±0.17µs        ? ?/sec    1.00     23.1±0.12µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/all_false                        1.00     17.7±0.13µs        ? ?/sec    1.01     17.9±0.20µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_array/all_true                         1.00     17.8±0.18µs        ? ?/sec    1.00     17.8±0.14µs        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/10pct_true             1.04     34.1±0.14ns        ? ?/sec    1.00     32.6±0.09ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/1pct_true              1.01     32.8±0.10ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/50pct_nulls            1.04     34.0±0.06ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/50pct_true             1.00     32.7±0.07ns        ? ?/sec    1.00     32.8±0.56ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/90pct_true             1.01     32.8±0.04ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/99pct_true             1.04     34.0±0.04ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/all_false              1.04     34.1±0.07ns        ? ?/sec    1.00     32.6±0.05ns        ? ?/sec
zip_8192_from_short strings (3..10)/array_vs_non_null_scalar/all_true               1.00     32.8±0.06ns        ? ?/sec    1.00     32.6±0.08ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/10pct_true             1.00     32.8±0.07ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/1pct_true              1.04     34.0±0.05ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/50pct_nulls            1.04     34.0±0.12ns        ? ?/sec    1.00     32.6±0.07ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/50pct_true             1.04     34.1±0.06ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/90pct_true             1.01     32.8±0.10ns        ? ?/sec    1.00     32.6±0.09ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/99pct_true             1.04     34.1±0.20ns        ? ?/sec    1.00     32.7±0.11ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/all_false              1.00     32.8±0.06ns        ? ?/sec    1.00     32.6±0.06ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_array/all_true               1.04     34.0±0.05ns        ? ?/sec    1.00     32.6±0.09ns        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/10pct_true       1.00     15.7±0.03µs        ? ?/sec    12.39   194.7±1.10µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/1pct_true        1.00     10.4±0.02µs        ? ?/sec    18.07   188.1±1.14µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/50pct_nulls      1.00     26.1±0.28µs        ? ?/sec    7.73    201.6±1.15µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/50pct_true       1.00     38.4±0.24µs        ? ?/sec    5.48    210.5±1.38µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/90pct_true       1.00     18.7±0.09µs        ? ?/sec    9.01    168.5±1.24µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/99pct_true       1.00     13.3±0.09µs        ? ?/sec    11.59   154.7±0.29µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/all_false        1.00   902.9±71.95ns        ? ?/sec    208.31   188.1±1.01µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_null_scalar_vs_null_scalar/all_true         1.00     12.7±0.08µs        ? ?/sec    11.99   151.8±0.44µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/10pct_true                    1.00     35.2±0.08µs        ? ?/sec    3.65    128.6±0.48µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/1pct_true                     1.00     15.3±0.02µs        ? ?/sec    7.85    120.5±0.53µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/50pct_nulls                   1.00     57.9±0.13µs        ? ?/sec    2.44    141.5±0.73µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/50pct_true                    1.00     68.5±0.11µs        ? ?/sec    2.32    159.3±0.24µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/90pct_true                    1.00     33.4±0.07µs        ? ?/sec    3.92    130.7±0.36µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/99pct_true                    1.00     16.1±0.05µs        ? ?/sec    7.51    121.0±0.30µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/all_false                     1.00      2.8±0.04µs        ? ?/sec    41.47   117.7±0.28µs        ? ?/sec
zip_8192_from_short strings (3..10)/non_nulls_scalars/all_true                      1.00      3.1±0.04µs        ? ?/sec    38.75   118.6±0.35µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/10pct_true              1.00     18.6±0.19µs        ? ?/sec    8.93    166.2±0.42µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/1pct_true               1.00     13.3±0.08µs        ? ?/sec    11.58   154.1±0.26µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/50pct_nulls             1.00     27.3±0.37µs        ? ?/sec    6.77    185.2±1.16µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/50pct_true              1.00     38.6±0.24µs        ? ?/sec    5.44    210.0±0.44µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/90pct_true              1.00     16.2±0.03µs        ? ?/sec    12.06   195.2±0.44µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/99pct_true              1.00     10.7±0.04µs        ? ?/sec    17.62   188.7±0.57µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/all_false               1.00     12.8±0.10µs        ? ?/sec    11.86   151.8±0.92µs        ? ?/sec
zip_8192_from_short strings (3..10)/null_vs_non_null_scalar/all_true                1.00  1026.7±65.52ns        ? ?/sec    181.01   185.9±0.50µs        ? ?/sec

@rluvaton
Copy link
Member Author

Good, it looks like we have massive speedups

alamb pushed a commit that referenced this pull request Oct 20, 2025
# Which issue does this PR close?

N/A

# Rationale for this change

doing `OffsetBuffer::from_lengths(std::iter::repeat_n(size,
value.len()));` does not utilize SIMD (I explain further if you want)
See [GodBolt Link](https://godbolt.org/z/PTsfvfjqx)

Extracted from:
- #8653

After this and the pr below is merged will improve the datafusion scalar
to array to use this and make it really really fast:
- #8658

# What changes are included in this PR?

added new function

# Are these changes tested?

yes

# Are there any user-facing changes?

yes
@alamb
Copy link
Contributor

alamb commented Oct 20, 2025

Good, it looks like we have massive speedups

yes, nice work!

alamb pushed a commit that referenced this pull request Oct 21, 2025
# Which issue does this PR close?

N/A

# Rationale for this change

I want to repeat the same value multiple times in a very fast way
which will be used in:
- #8653

After this and the pr below is merged will improve the datafusion scalar
to array to use this and make it really really fast:
- #8656 

# What changes are included in this PR?

Created a function in `MutableBuffer` to repeat a slice a number of
times in a logarithmic way to reduce memcopy calls

# Are these changes tested?

Yes

# Are there any user-facing changes?

Yes, and added docs

-------

Extracted from:
- #8653

Benchmark results on local machine

| Slice Length | Repetitions (n) | repeat_slice_n_times |
extend_from_slice loop | Speedup |

|--------------|-----------------|----------------------|------------------------|---------|
| 3 | 3 | 47.092 ns | 41.910 ns | 0.89x |
| 3 | 64 | 63.548 ns | 222.29 ns | 3.50x |
| 3 | 1024 | 105.57 ns | 3.031 µs | 28.7x |
| 3 | 8192 | 405.71 ns | 24.170 µs | 59.6x |
| 20 | 3 | 48.437 ns | 46.437 ns | 0.96x |
| 20 | 64 | 74.993 ns | 319.04 ns | 4.25x |
| 20 | 1024 | 350.94 ns | 4.437 µs | 12.6x |
| 20 | 8192 | 2.440 µs | 35.524 µs | 14.6x |
| 100 | 3 | 50.369 ns | 47.568 ns | 0.94x |
| 100 | 64 | 119.70 ns | 165.37 ns | 1.38x |
| 100 | 1024 | 1.734 µs | 2.623 µs | 1.51x |
| 100 | 8192 | 10.615 µs | 19.750 µs | 1.86x |

these are the results:

<details>
<summary>Result</summary>


```
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=3
                        time:   [46.719 ns 47.092 ns 47.453 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=3
                        time:   [41.833 ns 41.910 ns 41.996 ns]
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=64
                        time:   [62.935 ns 63.548 ns 64.183 ns]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=64
                        time:   [221.75 ns 222.29 ns 222.86 ns]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=1024
                        time:   [105.15 ns 105.57 ns 106.01 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=1024
                        time:   [3.0240 µs 3.0308 µs 3.0395 µs]
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=8192
                        time:   [401.57 ns 405.71 ns 409.94 ns]
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=8192
                        time:   [24.124 µs 24.170 µs 24.222 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=3
                        time:   [48.287 ns 48.437 ns 48.606 ns]
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=3
                        time:   [46.289 ns 46.437 ns 46.611 ns]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=64
                        time:   [74.625 ns 74.993 ns 75.395 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=64
                        time:   [318.20 ns 319.04 ns 319.98 ns]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=1024
                        time:   [346.66 ns 350.94 ns 355.17 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=1024
                        time:   [4.4251 µs 4.4369 µs 4.4506 µs]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=8192
                        time:   [2.4336 µs 2.4401 µs 2.4465 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=8192
                        time:   [35.466 µs 35.524 µs 35.589 µs]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=3
                        time:   [50.209 ns 50.369 ns 50.530 ns]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=3
                        time:   [47.439 ns 47.568 ns 47.701 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=64
                        time:   [117.77 ns 119.70 ns 122.00 ns]
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=64
                        time:   [164.88 ns 165.37 ns 166.07 ns]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=1024
                        time:   [1.7278 µs 1.7335 µs 1.7398 µs]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=1024
                        time:   [2.6176 µs 2.6232 µs 2.6305 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=8192
                        time:   [10.583 µs 10.615 µs 10.649 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=8192
                        time:   [19.471 µs 19.750 µs 20.185 µs]
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
```

</details>
# Conflicts:
#	arrow-buffer/src/buffer/mutable.rs
@rluvaton rluvaton marked this pull request as ready for review October 21, 2025 18:25
@alamb
Copy link
Contributor

alamb commented Oct 22, 2025

I am struggling to find enough contiguous focus time to review these PRs. They are on my radar, I just can't review them as fast as I want to

Hopefully other people will be able to help review too

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton -- this is (also) great 🚀

I think the only thing really needed is additional test coverage for the fallback impl and the special cases in BytesScalarImpl

Adding an implementation for ByteView types (Utf8View and BinaryView) will likely also improve performance a lot, but we can file a follow on ticket to track doing so -- this is better than what is currently on main

/// - either Datum is not a scalar (or has more than 1 element)
///
pub fn try_new(truthy: &dyn Datum, falsy: &dyn Datum) -> Result<Self, ArrowError> {
let (truthy, truthy_is_scalar) = truthy.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could potentially avoid the redundant call to truthy.get() and falsy.get() by returning Result<Option<Self>, ArrowError> (returning None if either argument was non scalar)

Comment on lines +534 to +543
fn combine_nulls_and_false(predicate: &BooleanArray) -> BooleanBuffer {
if let Some(nulls) = predicate.nulls().filter(|n| n.null_count() > 0) {
predicate.values().bitand(
// nulls are represented as 0 (false) in the values buffer
nulls.inner(),
)
} else {
predicate.values().clone()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


let zip_impl = downcast_primitive! {
truthy.data_type() => (primitive_size_helper),
DataType::Utf8 => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A natural extension of this work would be to add a special case for Datatype::Utf8View and DataType::BinaryView (as a follow on PR)

That would likely be super fast for many cases as it could simply copy views around and pre-compute the value buffer.

I'll file a follow on ticket

}

impl<T: ArrowPrimitiveType> PrimitiveScalarImpl<T> {
fn get_scalar_and_null_buffer_for_single_non_nullable(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused about this naming for a while -- eventually I see it means something like

Suggested change
fn get_scalar_and_null_buffer_for_single_non_nullable(
/// return an output array that has
/// `value` in all locations where predicate is true
/// `null` otherwise
fn get_scalar_and_null_buffer_for_single_non_nullable(

predicate: BooleanBuffer,
value: T::Native,
) -> (Vec<T::Native>, Option<NullBuffer>) {
let result_len = predicate.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed a special case for all nulls in the bytes impl

        let number_of_true = predicate.count_set_bits();

Is there a reason you didn't include the same case in the primitive builder?

Copy link
Member Author

@rluvaton rluvaton Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it might not be worth it.

For bytes we need to know the number of set and unset bits so we can preallocate the values buffer for all the cases


let true_repeat_count = end - start;
// fill with truthy values
mutable.repeat_slice_n_times(truthy_val, true_repeat_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having to copy the same iterator so many times is quite unfortunate and is what Utf8View is designed to avoid -- you can have a single copy of the string and then copy them around

This is explained in blog form here if you are not familiar with them: https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/

}

fn get_bytes_and_offset_for_all_same_value(
predicate: &BooleanBuffer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it a little confusing at first that predicate was passed in but only its length is used. Maybe passing in the len would make it clearer that the callsite doesn't need to negate the predicate as is needed in get_scalar_and_null_buffer_for_single_non_nullable

}
}

impl<T: ByteArrayType> BytesScalarImpl<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this is in its own impl block (not in the same as above?)

fn create_output(&self, input: &BooleanArray) -> Result<ArrayRef, ArrowError>;
}

#[derive(Debug, PartialEq)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be no test coverage for the FallbackImpl
I tested using

cargo llvm-cov --html test -p arrow-select
Screenshot 2025-10-23 at 3 50 10 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full report:
report.zip

}
}

fn get_scalar_and_null_buffer_for_single_non_nullable(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these codepaths also appear to be untested (see comment about about how I ran coverage)

Screenshot 2025-10-23 at 3 53 47 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants