Add fast mb_strcut implementation for UTF-16 #12524

alexdowad · 2023-10-25T20:44:20Z

Similar to the fast, specialized mb_strcut implementation for UTF-8 in 1f0cf13, this new implementation of mb_strcut for UTF-16 strings just examines a few bytes before each cut point.

Even for short strings, the new implementation is around 2x faster. For strings around 10,000 bytes in length, it comes out about 100-500x faster in my microbenchmarks.

The new implementation behaves identically to the old one on valid UTF-16 strings; a fuzzer was used to help verify this.

@Girgias @cmb69 @youkidearitai @kamil-tekiela @iluuu1994

Similar to the fast, specialized mb_strcut implementation for UTF-8 in 1f0cf13, this new implementation of mb_strcut for UTF-16 strings just examines a few bytes before each cut point. Even for short strings, the new implementation is around 2x faster. For strings around 10,000 bytes in length, it comes out about 100-500x faster in my microbenchmarks. The new implementation behaves identically to the old one on valid UTF-16 strings; a fuzzer was used to help verify this.

Girgias

Just some questions to make sure I understand everything, but the implementation looks sound to me.

ext/mbstring/libmbfl/filters/mbfilter_utf16.c

alexdowad · 2023-10-28T17:16:14Z

Landed on master.

github-actions bot added the Extension: mbstring label Oct 25, 2023

Girgias reviewed Oct 26, 2023

View reviewed changes

ext/mbstring/libmbfl/filters/mbfilter_utf16.c Show resolved Hide resolved

ext/mbstring/libmbfl/filters/mbfilter_utf16.c Show resolved Hide resolved

Girgias approved these changes Oct 28, 2023

View reviewed changes

alexdowad closed this Oct 28, 2023

alexdowad deleted the cututf16 branch October 28, 2023 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add fast mb_strcut implementation for UTF-16 #12524

Add fast mb_strcut implementation for UTF-16 #12524

Uh oh!

alexdowad commented Oct 25, 2023

Uh oh!

Girgias left a comment

Uh oh!

Uh oh!

Uh oh!

alexdowad commented Oct 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add fast mb_strcut implementation for UTF-16 #12524

Add fast mb_strcut implementation for UTF-16 #12524

Uh oh!

Conversation

alexdowad commented Oct 25, 2023

Uh oh!

Girgias left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alexdowad commented Oct 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants