Skip to content

Speedup character_length #15931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 3, 2025
Merged

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented May 2, 2025

Which issue does this PR close?

Rationale for this change

character_length benchmark

character_length_StringArray_ascii_str_len_8
                        time:   [9.2558 µs 9.2759 µs 9.2971 µs]
                        change: [-69.094% -69.031% -68.961%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

character_length_StringArray_utf8_str_len_8
                        time:   [51.115 µs 51.980 µs 52.962 µs]
                        change: [-50.042% -49.011% -47.983%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  15 (15.00%) high mild

character_length_StringViewArray_ascii_str_len_8
                        time:   [21.520 µs 21.669 µs 21.848 µs]
                        change: [-49.637% -49.231% -48.906%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

character_length_StringViewArray_utf8_str_len_8
                        time:   [63.238 µs 64.204 µs 65.335 µs]
                        change: [-49.047% -47.810% -46.388%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

character_length_StringArray_ascii_str_len_32
                        time:   [16.314 µs 16.388 µs 16.471 µs]
                        change: [-55.729% -55.606% -55.480%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

character_length_StringArray_utf8_str_len_32
                        time:   [85.483 µs 86.380 µs 87.239 µs]
                        change: [-27.927% -27.184% -26.451%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

character_length_StringViewArray_ascii_str_len_32
                        time:   [30.754 µs 30.849 µs 30.958 µs]
                        change: [-40.686% -40.498% -40.322%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

character_length_StringViewArray_utf8_str_len_32
                        time:   [120.58 µs 121.98 µs 123.22 µs]
                        change: [-22.085% -21.127% -20.191%] (p = 0.00 < 0.05)
                        Performance has improved.

character_length_StringArray_ascii_str_len_128
                        time:   [44.115 µs 44.194 µs 44.347 µs]
                        change: [-31.392% -31.296% -31.166%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

character_length_StringArray_utf8_str_len_128
                        time:   [138.57 µs 138.89 µs 139.12 µs]
                        change: [-13.267% -12.873% -12.475%] (p = 0.00 < 0.05)
                        Performance has improved.

character_length_StringViewArray_ascii_str_len_128
                        time:   [62.667 µs 62.775 µs 62.959 µs]
                        change: [-26.719% -26.554% -26.404%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  6 (6.00%) high severe

character_length_StringViewArray_utf8_str_len_128
                        time:   [166.17 µs 166.76 µs 167.41 µs]
                        change: [-14.128% -13.824% -13.512%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  12 (12.00%) high mild
  8 (8.00%) high severe

Benchmarking character_length_StringArray_ascii_str_len_4096: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
character_length_StringArray_ascii_str_len_4096
                        time:   [1.2724 ms 1.2755 ms 1.2791 ms]
                        change: [-1.6305% -1.2835% -0.9450%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) low severe
  4 (4.00%) low mild
  5 (5.00%) high severe

character_length_StringArray_utf8_str_len_4096
                        time:   [2.1568 ms 2.1614 ms 2.1654 ms]
                        change: [-0.2057% +0.0679% +0.3527%] (p = 0.65 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low severe
  3 (3.00%) low mild

Benchmarking character_length_StringViewArray_ascii_str_len_4096: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.4s, enable flat sampling, or reduce sample count to 50.
character_length_StringViewArray_ascii_str_len_4096
                        time:   [1.4566 ms 1.4609 ms 1.4647 ms]
                        change: [-1.6508% -1.3910% -1.1291%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) low severe
  5 (5.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

character_length_StringViewArray_utf8_str_len_4096
                        time:   [2.2177 ms 2.2231 ms 2.2284 ms]
                        change: [+0.5513% +0.8467% +1.1138%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) low mild

What changes are included in this PR?

Closes: #15930

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the functions Changes to functions implementation label May 2, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice speedup to me -- thank you @Dandandan

I will also run some benchmarks on this PR 👍

@Dandandan
Copy link
Contributor Author

Dandandan commented May 2, 2025

The main speedup I am expecting in e2e benchmarks is query 27 of clickbench, which has some mixed ascii / utf8 data and uses a LENGTH function.
Local runs don't show a very large diff (around 6%) - profiling shows a larger diff from ~15% to ~3% of samples in do_count_chars (i.e. .chars().count(). It might it is largely bottlenecked on IO on my machine as the actual difference is a bit smaller.

A large part from the speedup in the micro benchmark seems to come the faster array creation (collect into Vec instead of PrimitiveBuilder).

@alamb
Copy link
Contributor

alamb commented May 3, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_character_length (3ee71fe) to d2a2a8b diff
BENCH_NAME=character_length
BENCH_COMMAND=cargo bench --bench character_length
BENCH_FILTER=
BENCH_BRANCH_NAME=speedup_character_length
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented May 3, 2025

🤖: Benchmark completed

Details

group                                                  main                                   speedup_character_length
-----                                                  ----                                   ------------------------
character_length_StringArray_ascii_str_len_128         1.97     76.7±1.10µs        ? ?/sec    1.00     38.9±1.18µs        ? ?/sec
character_length_StringArray_ascii_str_len_32          2.85     57.6±0.16µs        ? ?/sec    1.00     20.2±0.07µs        ? ?/sec
character_length_StringArray_ascii_str_len_4096        1.04      2.2±0.10ms        ? ?/sec    1.00      2.1±0.04ms        ? ?/sec
character_length_StringArray_ascii_str_len_8           3.27     54.1±0.18µs        ? ?/sec    1.00     16.6±0.09µs        ? ?/sec
character_length_StringArray_utf8_str_len_128          1.16    287.0±0.71µs        ? ?/sec    1.00    246.7±1.19µs        ? ?/sec
character_length_StringArray_utf8_str_len_32           1.20    222.4±0.35µs        ? ?/sec    1.00    186.0±1.18µs        ? ?/sec
character_length_StringArray_utf8_str_len_4096         1.19      6.3±0.12ms        ? ?/sec    1.00      5.3±0.07ms        ? ?/sec
character_length_StringArray_utf8_str_len_8            1.06    157.0±0.33µs        ? ?/sec    1.00    148.6±0.23µs        ? ?/sec
character_length_StringViewArray_ascii_str_len_128     1.31    105.0±0.78µs        ? ?/sec    1.00     80.2±0.23µs        ? ?/sec
character_length_StringViewArray_ascii_str_len_32      1.52     90.7±0.14µs        ? ?/sec    1.00     59.7±0.73µs        ? ?/sec
character_length_StringViewArray_ascii_str_len_4096    1.04      2.1±0.07ms        ? ?/sec    1.00  1995.4±55.05µs        ? ?/sec
character_length_StringViewArray_ascii_str_len_8       1.68    105.3±0.11µs        ? ?/sec    1.00     62.5±0.17µs        ? ?/sec
character_length_StringViewArray_utf8_str_len_128      1.17    301.8±1.07µs        ? ?/sec    1.00    257.5±0.66µs        ? ?/sec
character_length_StringViewArray_utf8_str_len_32       1.19    237.7±0.62µs        ? ?/sec    1.00    199.0±0.51µs        ? ?/sec
character_length_StringViewArray_utf8_str_len_4096     1.22      6.4±0.11ms        ? ?/sec    1.00      5.3±0.10ms        ? ?/sec
character_length_StringViewArray_utf8_str_len_8        1.02    167.2±0.32µs        ? ?/sec    1.00    164.2±0.29µs        ? ?/sec

@Dandandan Dandandan merged commit e3e5d19 into apache:main May 3, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented May 5, 2025

Sweeet

@alamb
Copy link
Contributor

alamb commented May 5, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_character_length (2a789fb) to b4b77e9 diff
Benchmarks: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented May 5, 2025

🤖: Benchmark completed

Details

Comparing HEAD and speedup_character_length
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ speedup_character_length ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  1939.61ms │                1889.99ms │    no change │
│ QQuery 1     │   716.62ms │                 681.99ms │    no change │
│ QQuery 2     │  1471.43ms │                1401.11ms │    no change │
│ QQuery 3     │   692.57ms │                 713.61ms │    no change │
│ QQuery 4     │  1497.11ms │                1475.79ms │    no change │
│ QQuery 5     │ 15532.88ms │               15135.69ms │    no change │
│ QQuery 6     │  2052.05ms │                2053.99ms │    no change │
│ QQuery 7     │  2549.28ms │                2708.73ms │ 1.06x slower │
└──────────────┴────────────┴──────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 26451.56ms │
│ Total Time (speedup_character_length)   │ 26060.91ms │
│ Average Time (HEAD)                     │  3306.44ms │
│ Average Time (speedup_character_length) │  3257.61ms │
│ Queries Faster                          │          0 │
│ Queries Slower                          │          1 │
│ Queries with No Change                  │          7 │
└─────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       HEAD ┃ speedup_character_length ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.23ms │                   2.22ms │     no change │
│ QQuery 1     │    35.18ms │                  35.41ms │     no change │
│ QQuery 2     │    89.02ms │                  88.49ms │     no change │
│ QQuery 3     │    99.09ms │                  99.71ms │     no change │
│ QQuery 4     │   764.18ms │                 733.65ms │     no change │
│ QQuery 5     │   838.99ms │                 819.92ms │     no change │
│ QQuery 6     │     2.19ms │                   2.22ms │     no change │
│ QQuery 7     │    42.01ms │                  42.46ms │     no change │
│ QQuery 8     │   922.28ms │                 887.60ms │     no change │
│ QQuery 9     │  1189.40ms │                1195.29ms │     no change │
│ QQuery 10    │   259.96ms │                 268.52ms │     no change │
│ QQuery 11    │   302.40ms │                 294.95ms │     no change │
│ QQuery 12    │   902.84ms │                 915.93ms │     no change │
│ QQuery 13    │  1303.06ms │                1330.53ms │     no change │
│ QQuery 14    │   840.14ms │                 840.64ms │     no change │
│ QQuery 15    │  1036.41ms │                1024.35ms │     no change │
│ QQuery 16    │  1722.50ms │                1731.79ms │     no change │
│ QQuery 17    │  1600.64ms │                1582.66ms │     no change │
│ QQuery 18    │  3042.96ms │                3115.75ms │     no change │
│ QQuery 19    │    82.74ms │                  86.44ms │     no change │
│ QQuery 20    │  1130.30ms │                1135.62ms │     no change │
│ QQuery 21    │  1305.55ms │                1343.39ms │     no change │
│ QQuery 22    │  2184.83ms │                2228.41ms │     no change │
│ QQuery 23    │  8182.33ms │                8253.08ms │     no change │
│ QQuery 24    │   476.47ms │                 464.53ms │     no change │
│ QQuery 25    │   388.59ms │                 404.23ms │     no change │
│ QQuery 26    │   536.72ms │                 531.61ms │     no change │
│ QQuery 27    │  1687.52ms │                1566.65ms │ +1.08x faster │
│ QQuery 28    │ 12667.26ms │               12545.37ms │     no change │
│ QQuery 29    │   535.86ms │                 542.98ms │     no change │
│ QQuery 30    │   802.18ms │                 818.96ms │     no change │
│ QQuery 31    │   857.58ms │                 849.67ms │     no change │
│ QQuery 32    │  2671.10ms │                2673.69ms │     no change │
│ QQuery 33    │  3341.69ms │                3354.98ms │     no change │
│ QQuery 34    │  3384.67ms │                3388.42ms │     no change │
│ QQuery 35    │  1294.53ms │                1271.00ms │     no change │
│ QQuery 36    │   131.39ms │                 125.25ms │     no change │
│ QQuery 37    │    56.86ms │                  57.40ms │     no change │
│ QQuery 38    │   125.89ms │                 127.28ms │     no change │
│ QQuery 39    │   203.11ms │                 200.27ms │     no change │
│ QQuery 40    │    50.23ms │                  49.48ms │     no change │
│ QQuery 41    │    44.56ms │                  46.18ms │     no change │
│ QQuery 42    │    40.94ms │                  40.01ms │     no change │
└──────────────┴────────────┴──────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 57178.39ms │
│ Total Time (speedup_character_length)   │ 57116.98ms │
│ Average Time (HEAD)                     │  1329.73ms │
│ Average Time (speedup_character_length) │  1328.30ms │
│ Queries Faster                          │          1 │
│ Queries Slower                          │          0 │
│ Queries with No Change                  │         42 │
└─────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     HEAD ┃ speedup_character_length ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 120.67ms │                 122.47ms │     no change │
│ QQuery 2     │  23.11ms │                  23.81ms │     no change │
│ QQuery 3     │  34.89ms │                  34.79ms │     no change │
│ QQuery 4     │  21.05ms │                  20.99ms │     no change │
│ QQuery 5     │  56.47ms │                  54.71ms │     no change │
│ QQuery 6     │  12.32ms │                  12.24ms │     no change │
│ QQuery 7     │ 103.03ms │                 101.80ms │     no change │
│ QQuery 8     │  26.35ms │                  25.88ms │     no change │
│ QQuery 9     │  62.27ms │                  63.01ms │     no change │
│ QQuery 10    │  57.70ms │                  57.51ms │     no change │
│ QQuery 11    │  13.00ms │                  13.20ms │     no change │
│ QQuery 12    │  46.91ms │                  45.88ms │     no change │
│ QQuery 13    │  28.67ms │                  29.48ms │     no change │
│ QQuery 14    │   9.99ms │                  10.21ms │     no change │
│ QQuery 15    │  24.47ms │                  25.45ms │     no change │
│ QQuery 16    │  22.98ms │                  22.84ms │     no change │
│ QQuery 17    │  96.96ms │                  95.48ms │     no change │
│ QQuery 18    │ 232.49ms │                 233.44ms │     no change │
│ QQuery 19    │  29.27ms │                  26.71ms │ +1.10x faster │
│ QQuery 20    │  37.23ms │                  38.76ms │     no change │
│ QQuery 21    │ 166.09ms │                 171.19ms │     no change │
│ QQuery 22    │  18.08ms │                  17.36ms │     no change │
└──────────────┴──────────┴──────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 1243.99ms │
│ Total Time (speedup_character_length)   │ 1247.21ms │
│ Average Time (HEAD)                     │   56.55ms │
│ Average Time (speedup_character_length) │   56.69ms │
│ Queries Faster                          │         1 │
│ Queries Slower                          │         0 │
│ Queries with No Change                  │        21 │
└─────────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor Author

│ QQuery 27 │ 1687.52ms │ 1566.65ms │ +1.08x faster │

Nice, seems roughly the same result I got

@alamb
Copy link
Contributor

alamb commented May 5, 2025

Every little bit helps. It is so neat to see this process churning along

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Changes to functions implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speedup character_length
2 participants