Skip to content

Conversation

@alexdowad
Copy link
Contributor

Now mb_convert_kana has been moved over to using the new text conversion filters.

Fuzzing with a modified sapi/fuzzer/php-fuzz-mbstring was used to ensure there is no change from the old behavior (except for the fix for U+0000). A few known bugs in the old encoding conversion filters caused spurious fuzzer crashes; fixes for the old encoding conversion filters so as to suppress those as I continue moving forward.

FYA @cmb69 @nikic @kamil-tekiela

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for further improving MBString! From what I can tell, this looks good to me, but I left a few comments.

mb_convert_kana now uses the new text encoding conversion
filters. Microbenchmarking shows speed gains of 50%-150%
across various text encodings and input string lengths.

The behavior is the same as the old mb_convert_kana
except for one fix: if the 'zero codepoint' U+0000 appeared
in the input, the old implementation would sometimes drop
it, not passing it through to the output. This is now
fixed.
@alexdowad alexdowad force-pushed the cleanup-mbstring-23 branch from 04ba80c to 6d3f5ca Compare July 19, 2022 11:12
@alexdowad
Copy link
Contributor Author

Thanks to @cmb69 for the review.

I may fuzz just a bit more, but in the meantime, review from others (if available) would be appreciated.

@alexdowad
Copy link
Contributor Author

I have done some more fuzzing; didn't find any bugs.

@alexdowad
Copy link
Contributor Author

If no-one else wants to review, I will merge tomorrow.

@alexdowad alexdowad closed this Jul 20, 2022
@alexdowad alexdowad deleted the cleanup-mbstring-23 branch July 20, 2022 05:46
static void mb_wchar_to_cp50222(uint32_t *in, size_t len, mb_convert_buf *buf, bool end);

/* See mbstring.c */
uint32_t mb_convert_kana_codepoint(uint32_t c, uint32_t next, bool *consumed, uint32_t *second, int mode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is introducing a reverse-dependency from libmbfl to mbstring. Probably the function should continue to live in libmbfl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, indeed. @alexdowad, can you please address this? If mb_convert_kana_codepoint() can be easily changed to be moved to libmbfl, an option might be to establish a callback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do (in my next PR, which is close to ready).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants