Skip to content

Performance is subpar compared to simdutf::validate_utf8() #6

Closed
@lpinca

Description

@lpinca

I've opened websockets/utf-8-validate#109 to use is_utf8 in utf-8-validate.

When running the same benchmarks run in websockets/utf-8-validate#101 I noticed a significant performance drop.

$ npx envinfo --system

  System:
    OS: macOS 13.1
    CPU: (16) x64 Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
    Memory: 21.54 GB / 32.00 GB
    Shell: 5.2.15 - /usr/local/bin/bash

is_utf8()

$ node bench.js 
Loading https://en.wikipedia.org/wiki/Main_Page ...
utf-8-validate (5.0.10, C++) x 106,808 ops/sec ±0.27% (95 runs sampled)
utf-8-validate (is_utf8, C++) x 105,159 ops/sec ±0.11% (95 runs sampled)
------------------------------------------------------------

Loading https://ro.wikipedia.org/wiki/Pagina_principală ...
utf-8-validate (5.0.10, C++) x 25,410 ops/sec ±0.09% (97 runs sampled)
utf-8-validate (is_utf8, C++) x 54,815 ops/sec ±0.09% (96 runs sampled)
------------------------------------------------------------

Loading https://ru.wikipedia.org/wiki/Заглавная_страница ...
utf-8-validate (5.0.10, C++) x 15,160 ops/sec ±0.10% (98 runs sampled)
utf-8-validate (is_utf8, C++) x 63,985 ops/sec ±0.09% (99 runs sampled)
------------------------------------------------------------

Loading https://ar.wikipedia.org/wiki/الصفحة_الرئيسية ...
utf-8-validate (5.0.10, C++) x 12,766 ops/sec ±0.08% (98 runs sampled)
utf-8-validate (is_utf8, C++) x 57,442 ops/sec ±0.08% (98 runs sampled)
------------------------------------------------------------

Loading https://ja.wikipedia.org/wiki/メインページ ...
utf-8-validate (5.0.10, C++) x 23,306 ops/sec ±0.07% (96 runs sampled)
utf-8-validate (is_utf8, C++) x 79,199 ops/sec ±0.10% (95 runs sampled)
------------------------------------------------------------

Loading https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt ...
utf-8-validate (5.0.10, C++) x 66,890 ops/sec ±0.10% (99 runs sampled)
utf-8-validate (is_utf8, C++) x 622,514 ops/sec ±0.10% (99 runs sampled)

simdutf::validate_utf8()

$ node bench.js 
Loading https://en.wikipedia.org/wiki/Main_Page ...
utf-8-validate (5.0.10, C++) x 107,373 ops/sec ±0.08% (99 runs sampled)
utf-8-validate (simdutf, C++) x 749,966 ops/sec ±0.20% (96 runs sampled)
------------------------------------------------------------

Loading https://ro.wikipedia.org/wiki/Pagina_principală ...
utf-8-validate (5.0.10, C++) x 25,413 ops/sec ±0.08% (98 runs sampled)
utf-8-validate (simdutf, C++) x 144,119 ops/sec ±0.19% (95 runs sampled)
------------------------------------------------------------

Loading https://ru.wikipedia.org/wiki/Заглавная_страница ...
utf-8-validate (5.0.10, C++) x 15,164 ops/sec ±0.08% (98 runs sampled)
utf-8-validate (simdutf, C++) x 176,840 ops/sec ±0.18% (94 runs sampled)
------------------------------------------------------------

Loading https://ar.wikipedia.org/wiki/الصفحة_الرئيسية ...
utf-8-validate (5.0.10, C++) x 12,781 ops/sec ±0.08% (100 runs sampled)
utf-8-validate (simdutf, C++) x 152,366 ops/sec ±0.16% (97 runs sampled)
------------------------------------------------------------

Loading https://ja.wikipedia.org/wiki/メインページ ...
utf-8-validate (5.0.10, C++) x 23,298 ops/sec ±0.09% (99 runs sampled)
utf-8-validate (simdutf, C++) x 222,036 ops/sec ±0.18% (95 runs sampled)
------------------------------------------------------------

Loading https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt ...
utf-8-validate (5.0.10, C++) x 66,875 ops/sec ±0.11% (95 runs sampled)
utf-8-validate (simdutf, C++) x 1,051,967 ops/sec ±0.09% (100 runs sampled)

I did not investigate but I think this is because there is no AVX-512 implementation in is_utf8(), right?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions