Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Oct 30, 2025

This is my attempt to show a performance improvement for

To justify the additional code / complexity

Supercedes

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 30, 2025
bit_mask::set_bits(
self.buffer.as_slice_mut(),
to_set,
self.buffer.bitwise_binary_op(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only actual change in this pr

@alamb
Copy link
Contributor Author

alamb commented Oct 30, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/test_append2 (29f063f) to 2eabb59 diff
BENCH_NAME=boolean_append_packed
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench boolean_append_packed
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_test_append2
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Oct 30, 2025

🤖: Benchmark completed

Details

group                    alamb_test_append2                     main
-----                    ------------------                     ----
boolean_append_packed    1.00      5.6±0.01µs        ? ?/sec    3.56     19.8±0.04µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Oct 30, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/test_append2 (29f063f) to 2eabb59 diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_test_append2
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Oct 30, 2025

🤖: Benchmark completed

Details

group                                                          alamb_test_append2                     main
-----                                                          ------------------                     ----
concat 1024 arrays boolean 4                                   1.00     22.3±0.07µs        ? ?/sec    1.28     28.6±0.14µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     13.7±0.04µs        ? ?/sec    1.00     13.6±0.02µs        ? ?/sec
concat 1024 arrays str 4                                       1.00     36.3±0.25µs        ? ?/sec    1.03     37.4±0.28µs        ? ?/sec
concat boolean 1024                                            1.00    331.9±0.57ns        ? ?/sec    1.31    435.3±0.44ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.00      5.1±0.02µs        ? ?/sec    10.02    51.1±0.11µs        ? ?/sec
concat boolean nulls 1024                                      1.00    563.4±0.91ns        ? ?/sec    1.38    775.8±1.44ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.1±0.06µs        ? ?/sec    6.07    109.8±0.35µs        ? ?/sec
concat fixed size lists                                        1.08  960.0±198.88µs        ? ?/sec    1.00  885.5±119.63µs        ? ?/sec
concat i32 1024                                                1.00    382.9±0.66ns        ? ?/sec    1.01    386.0±0.92ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.03   232.2±11.04µs        ? ?/sec    1.00    224.5±8.45µs        ? ?/sec
concat i32 nulls 1024                                          1.00    591.8±1.27ns        ? ?/sec    1.21    716.1±1.50ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.00    276.3±9.19µs        ? ?/sec    1.15   316.4±17.83µs        ? ?/sec
concat str 1024                                                1.00     13.3±0.96µs        ? ?/sec    1.00     13.4±0.86µs        ? ?/sec
concat str 8192 over 100 arrays                                1.00    110.3±1.20ms        ? ?/sec    1.01    111.9±1.56ms        ? ?/sec
concat str nulls 1024                                          1.00      5.9±0.46µs        ? ?/sec    1.07      6.3±0.45µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.00     55.2±0.57ms        ? ?/sec    1.00     55.1±0.66ms        ? ?/sec
concat str_dict 1024                                           1.02      2.9±0.02µs        ? ?/sec    1.00      2.8±0.01µs        ? ?/sec
concat str_dict_sparse 1024                                    1.00      7.0±0.03µs        ? ?/sec    1.00      7.0±0.03µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.00      7.1±0.11µs        ? ?/sec    1.02      7.2±0.08µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.03     79.1±0.82µs        ? ?/sec    1.00     77.0±0.35µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     79.8±0.64µs        ? ?/sec    1.06     84.3±0.48µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.00     77.9±0.51µs        ? ?/sec    1.16     90.3±0.83µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.00     79.3±0.42µs        ? ?/sec    1.22     96.9±0.39µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.00     47.1±2.73µs        ? ?/sec    1.00     47.3±2.53µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.00     51.9±3.41µs        ? ?/sec    1.04     54.0±2.36µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Oct 31, 2025

The improvement in concat_boolean are quite impressive

concat boolean 1024                                            1.00    331.9±0.57ns        ? ?/sec    1.31    435.3±0.44ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.00      5.1±0.02µs        ? ?/sec    10.02    51.1±0.11µs        ? ?/sec
concat boolean nulls 1024                                      1.00    563.4±0.91ns        ? ?/sec    1.38    775.8±1.44ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.1±0.06µs        ? ?/sec    6.07    109.8±0.35µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Oct 31, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/test_append2 (29f063f) to 2eabb59 diff
BENCH_NAME=boolean_append_packed
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench boolean_append_packed
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_test_append2
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Oct 31, 2025

🤖: Benchmark completed

Details

group                    alamb_test_append2                     main
-----                    ------------------                     ----
boolean_append_packed    1.00      5.3±0.01µs        ? ?/sec    2.81     15.0±0.04µs        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Oct 31, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/test_append2 (29f063f) to 2eabb59 diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_test_append2
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Oct 31, 2025

🤖: Benchmark completed

Details

group                                                          alamb_test_append2                     main
-----                                                          ------------------                     ----
concat 1024 arrays boolean 4                                   1.00     22.2±0.03µs        ? ?/sec    1.26     28.1±0.04µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     13.4±0.03µs        ? ?/sec    1.03     13.8±0.03µs        ? ?/sec
concat 1024 arrays str 4                                       1.00     36.2±0.28µs        ? ?/sec    1.00     36.2±0.25µs        ? ?/sec
concat boolean 1024                                            1.00    329.0±0.36ns        ? ?/sec    1.25    411.0±5.48ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.00      5.1±0.02µs        ? ?/sec    10.07    51.0±0.12µs        ? ?/sec
concat boolean nulls 1024                                      1.00    559.5±1.68ns        ? ?/sec    1.33    742.3±3.51ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.0±0.05µs        ? ?/sec    6.10    109.9±0.38µs        ? ?/sec
concat fixed size lists                                        1.00   787.5±25.18µs        ? ?/sec    1.07   845.3±28.46µs        ? ?/sec
concat i32 1024                                                1.00    385.8±0.56ns        ? ?/sec    1.03    396.2±0.82ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.00    210.7±4.32µs        ? ?/sec    1.01    212.5±4.14µs        ? ?/sec
concat i32 nulls 1024                                          1.00    598.4±1.44ns        ? ?/sec    1.21    723.9±2.93ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.00    240.7±4.29µs        ? ?/sec    1.20    288.6±2.17µs        ? ?/sec
concat str 1024                                                1.00     12.4±0.95µs        ? ?/sec    1.05     13.1±0.84µs        ? ?/sec
concat str 8192 over 100 arrays                                1.00    106.4±0.93ms        ? ?/sec    1.01    107.8±0.80ms        ? ?/sec
concat str nulls 1024                                          1.00      5.7±0.48µs        ? ?/sec    1.12      6.4±0.92µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.00     54.0±0.42ms        ? ?/sec    1.01     54.6±0.40ms        ? ?/sec
concat str_dict 1024                                           1.00      2.8±0.01µs        ? ?/sec    1.04      2.9±0.01µs        ? ?/sec
concat str_dict_sparse 1024                                    1.00      6.9±0.04µs        ? ?/sec    1.02      7.0±0.02µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.01      7.0±0.03µs        ? ?/sec    1.00      6.9±0.11µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.00     78.3±0.36µs        ? ?/sec    1.04     81.8±1.21µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     79.5±0.50µs        ? ?/sec    1.07     84.7±0.47µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.00     77.6±0.32µs        ? ?/sec    1.14     88.4±0.48µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.00     79.4±0.28µs        ? ?/sec    1.20     94.9±0.39µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.00     46.0±3.79µs        ? ?/sec    1.01     46.5±2.86µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.00     48.2±3.57µs        ? ?/sec    1.12     53.9±3.48µs        ? ?/sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants