slice/ascii: Optimize `eq_ignore_ascii_case` with auto-vectorization #147436

okaneco · 2025-10-07T09:59:57Z

Refactor the current functionality into a helper function
Use as_chunks to encourage auto-vectorization in the optimized chunk processing function
Add a codegen test checking for vectorization and no panicking
Add benches for eq_ignore_ascii_case

The optimized function is initially only enabled for x86_64 which has sse2 as part of its baseline, but none of the code is platform specific. Other platforms with SIMD instructions may also benefit from this implementation.

Performance improvements only manifest for slices of 16 bytes or longer, so the optimized path is gated behind a length check for greater than or equal to 16.

Benchmarks - Cases below 16 bytes are unaffected, cases above all show sizeable improvements.

before:
    str::eq_ignore_ascii_case::bench_large_str_eq         4942.30ns/iter +/- 48.20
    str::eq_ignore_ascii_case::bench_medium_str_eq         632.01ns/iter +/- 16.87
    str::eq_ignore_ascii_case::bench_str_17_bytes_eq        16.28ns/iter  +/- 0.45
    str::eq_ignore_ascii_case::bench_str_31_bytes_eq        35.23ns/iter  +/- 2.28
    str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq       7.56ns/iter  +/- 0.22
    str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq    2.64ns/iter  +/- 0.06
after:
    str::eq_ignore_ascii_case::bench_large_str_eq         611.63ns/iter +/- 28.29
    str::eq_ignore_ascii_case::bench_medium_str_eq         77.10ns/iter +/- 19.76
    str::eq_ignore_ascii_case::bench_str_17_bytes_eq        3.49ns/iter  +/- 0.39
    str::eq_ignore_ascii_case::bench_str_31_bytes_eq        3.50ns/iter  +/- 0.27
    str::eq_ignore_ascii_case::bench_str_of_8_bytes_eq      7.27ns/iter  +/- 0.09
    str::eq_ignore_ascii_case::bench_str_under_8_bytes_eq   2.60ns/iter  +/- 0.05

Refactor the current functionality into a helper function Use `as_chunks` to encourage auto-vectorization in the optimized chunk processing function Add a codegen test Add benches for `eq_ignore_ascii_case` The optimized function is initially only enabled for x86_64 which has `sse2` as part of its baseline, but none of the code is platform specific. Other platforms with SIMD instructions may also benefit from this implementation. Performance improvements only manifest for slices of 16 bytes or longer, so the optimized path is gated behind a length check for greater than or equal to 16.

rustbot · 2025-10-07T10:00:02Z

r? @scottmcm

rustbot has assigned @scottmcm.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Refactor the eq check into an inner function for reuse in tail checking Rather than fall back to the simple implementation for tail handling, load the last 16 bytes to take advantage of vectorization. This doesn't seem to negatively impact check time even when the remainder count is low.

okaneco · 2025-10-07T19:53:31Z

I've pushed a commit to avoid falling back to the scalar checking for the remainder handling.

We reload the last 16 bytes of the slices if there's a remainder, which improves the 31 byte case and doesn't seem to regress the 17 byte case.

scalar tail handling
    ascii::eq_ignore_ascii_case::bench_long_str_eq          54.75ns/iter +/- 1.51
    ascii::eq_ignore_ascii_case::bench_str_17_bytes_eq       4.77ns/iter +/- 0.12
    ascii::eq_ignore_ascii_case::bench_str_31_bytes_eq      23.00ns/iter +/- 4.56
    ascii::eq_ignore_ascii_case::bench_str_of_8_bytes_eq     7.61ns/iter +/- 0.16
    ascii::eq_ignore_ascii_case::bench_str_under_8_bytes_eq  2.61ns/iter +/- 0.07
load last 16 bytes of the slice, newest commit
    ascii::eq_ignore_ascii_case::bench_long_str_eq          51.60ns/iter +/- 5.28
    ascii::eq_ignore_ascii_case::bench_str_17_bytes_eq       3.62ns/iter +/- 0.54
    ascii::eq_ignore_ascii_case::bench_str_31_bytes_eq       3.56ns/iter +/- 0.27
    ascii::eq_ignore_ascii_case::bench_str_of_8_bytes_eq     7.79ns/iter +/- 1.01
    ascii::eq_ignore_ascii_case::bench_str_under_8_bytes_eq  2.73ns/iter +/- 0.05

library/coretests/benches/ascii/eq_ignore_ascii_case.rs