Case-Insensitive UTF-8 Search with AVX-512 🌾🪡🌾 by ashvardanian · Pull Request #286 · ashvardanian/StringZilla

ashvardanian · 2025-11-29T18:32:21Z

Below are the performance numbers comparing the search throughput of unique "word" tokens across various languages of the Leipzig Wikipedia Corpora for a case-insensitive substring search that respects all Unicode 17.0 case-folding rules. This is arguably the only library providing full Unicode spec compliance for search operations besides the PCRE2 library, which is often order(s) of magnitude slower than even our serial baseline due to the extreme complexity of combining a complete RegEx engine with Unicode compliance.

Corpora Language	Script	Serial Baseline, GB/s	AVX-512 for Ice Lake+, GB/s	Speedup
Latin (Basic)
🇬🇧 English	Latin	1.15	10.93	11.9×
🇮🇹 Italian	Latin	0.81	10.63	14.7×
🇳🇱 Dutch	Latin	0.85	10.91	13.3×
Latin (Extended)
🇩🇪 German	Latin+ß	0.74	9.36	13.6×
🇫🇷 French	Latin+Acc	0.73	8.37	15.1×
🇪🇸 Spanish	Latin+ñ	0.99	8.86	10.8×
🇵🇹 Portuguese	Latin+Acc	0.77	9.58	14.3×
🇵🇱 Polish	Latin+Ext	0.62	7.51	14.2×
🇨🇿 Czech	Latin+Háčky	0.43	6.10	17.1×
🇹🇷 Turkish	Latin+İ/ı	0.81	6.78	11.7×
🇻🇳 Vietnamese	Latin+Tones	0.41	6.38	17.9×
Cyrillic
🇷🇺 Russian	Cyrillic	0.54	3.41	10.6×
🇺🇦 Ukrainian	Cyrillic	0.56	4.03	10.6×
Greek
🇬🇷 Greek	Greek	0.31	7.04	22.5×
Caucasian
🇦🇲 Armenian	Armenian	0.34	4.18	17.5×
🇬🇪 Georgian	Georgian	0.65	10.56	24.2×
Semitic
🇮🇱 Hebrew	Hebrew	0.65	9.52	13.7×
🇸🇦 Arabic	Arabic	1.17	9.85	9.8×
🇮🇷 Persian	Arabic+Ext	0.41	11.83	43.1×
Indic
🇮🇳 Hindi	Devanagari	1.25	10.99	16.3×
🇧🇩 Bengali	Bengali	0.72	11.03	25.9×
🇮🇳 Tamil	Tamil	1.09	11.70	21.0×
CJK & East Asian
🇯🇵 Japanese	CJK+Kana	0.52	11.56	26.7×
🇰🇷 Korean	Hangul	2.98	11.58	3.5×
🇨🇳 Chinese	CJK	0.43	20.07	103.0×

The new tests detect a bug in handling inputs like "中ABC".

New behaviour differs for `str` and `bytes` args

This design is cleaner, but I'm not seeing any gains on AMD Zen5. Closes #240

Relates to #288

Port pressure went down from 8+6 on p5 and p0 respectively, to 6+5.

Naive: 12 p5 Before: 10 p5 ops + 1 p0 op After: 8 p5 ops + 4 p0 ops

Combines XOR + VPTERNLOG + VPTESTNMB to reduce port pressure on Intel CPUs

Yields a 30% performance improvement in a such megakernels with sequential memory access pattern

150x improvement over PyICU `icu.StringSearch` baseline

ashvardanian added 30 commits November 29, 2025 15:30

Chore: Configure 2-space JSON indent

87c1465

Docs: Inconsistent UTF-8 fold explanations

c7a3012

Merge: v4.4 release

6caebd6

Make: Ignore UV lock

2c3d35d

Fix: Require continuous substitution matrices

20ac49a

Fix: Gracefully handle Unicode spec download issues

44412bf

Docs: Exaplain convoluted control-flow

44b6279

Improve: Boundary condition fold tests

bbea84f

The new tests detect a bug in handling inputs like "中ABC".

Fix: Folding "中ABC" on Ice Lake

20dbef3

Improve: Faster optional UTF-8 validation

7edba6f

Improve: Optional start/end for folded find

62ad6f7

New behaviour differs for `str` and `bytes` args

Improve: Faster serial baselines for ASCII needles

e5227ad

Improve: Avoid UTF-8 checks in case-fold

d8aac4a

Improve: Faster LUT on Ice Lake and Zen4+

b13aef4

This design is cleaner, but I'm not seeing any gains on AMD Zen5. Closes #240

Add: Draft case-insensitive search on Ice Lake

4d30daa

Add: Draft TR29 Unicode word-bound iterators

3ca6695

Improve: Cleaner Raita kernels - unstable

f642fa9

Docs: Missing table info

c8b6ae1

Improve: Use ring-buffers for O(1) prefix hashes

2db5e54

Improve: Avoid modulo division

024f677

Add: Branchless .empty() for small strings

ea258c1

Relates to #288

Improve: Length-returning small-string API

be9e2d7

Docs: Small-string safety comments

83ca417

Improve: Self-equality & overflow protection

b325b1c

Improve: Vietnamese fastt case-folding path

025b36d

Add: Latin-1 case-folded search

c1c0305

Improve: Case-insensitive search for Ry, Vi, El, Hy (Am)

fd89a88

Fix: Case-insensitive search passes test

726bbbd

Improve: Unnecassary checks in ci-find

c19f11c

Chore: Word-bounds code formatting

89d5530

ashvardanian added 2 commits December 13, 2025 14:50

Fix: Handle failed downloads of UCD specs

13bc864

Fix: NULL missing - use SZ_NULL

7e3dd35

ashvardanian force-pushed the main-dev branch from ad2fd8d to 7e3dd35 Compare December 13, 2025 16:26

ashvardanian added 26 commits December 13, 2025 19:17

Fix: Generalize static asserts to 32-bit archs

bde1fad

Improve: Higher-efficiency Ice Lake kernels

dce1773

Make: Bump Rust & Go CI

7847671

Improve: VPSHUFB & VPTERNLOG for search

a315ee8

Improve: Flatten danger zone checks

a8e3f66

Make: Install bash on Alpine for Rust toolchain

999ec64

Add: Reusable case-insensitive needles with metadata for Rust & C++

45e3c92

Improve: Separate "alarm" functions for danger zones

936fc22

Improve: Greek alarm with less register pressure

22dc88c

Port pressure went down from 8+6 on p5 and p0 respectively, to 6+5.

Chore: Variable naming

cc2b5a6

Improve: Western European register pressure

fe94e1c

Naive: 12 p5 Before: 10 p5 ops + 1 p0 op After: 8 p5 ops + 4 p0 ops

Improve: Deduplicate benchmark input tokens

68287fc

Improve: Faster ASCII kernels for ≤ 3 probes

e6626a8

Combines XOR + VPTERNLOG + VPTESTNMB to reduce port pressure on Intel CPUs

Improve: Generalize case-invariant logic

1239bea

Docs: Arm NEON case-folding plans

8a3f25d

Make: Install curl on Alpine for Rust kit

85af5b5

Fix: Rust UTF-8 iterator doctest

7fff78b

Fix: Shaddowing template param on NVCC

908d7f9

Fix: Missing span::operator==0 for new NVCC benchamrks

9b6911a

Improve: Prefetch on massive inputs

48a8ccb

Yields a 30% performance improvement in a such megakernels with sequential memory access pattern

Docs: UTF-8 Fold & Search with PErf numbers

b69c49d

150x improvement over PyICU `icu.StringSearch` baseline

Add: Case-folding for JavaScript

93bc9c4

Add: Case-folding for GoLang

2e310d6

Add: Case-folding for Swift

f621419

Docs: Badges, CLI, & inconsistencies

b9aa985

Fix: Pointer cast in GoLang

fb20b6c

ashvardanian merged commit ca7e505 into main Dec 15, 2025
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Case-Insensitive UTF-8 Search with AVX-512 🌾🪡🌾#286

Case-Insensitive UTF-8 Search with AVX-512 🌾🪡🌾#286
ashvardanian merged 147 commits intomainfrom
main-dev

ashvardanian commented Nov 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ashvardanian commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ashvardanian commented Nov 29, 2025 •

edited

Loading