- 
                Notifications
    You must be signed in to change notification settings 
- Fork 8k
Major overhaul of mbstring (part 23) #9042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e5ecd11    to
    04ba80c      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for further improving MBString! From what I can tell, this looks good to me, but I left a few comments.
mb_convert_kana now uses the new text encoding conversion filters. Microbenchmarking shows speed gains of 50%-150% across various text encodings and input string lengths. The behavior is the same as the old mb_convert_kana except for one fix: if the 'zero codepoint' U+0000 appeared in the input, the old implementation would sometimes drop it, not passing it through to the output. This is now fixed.
04ba80c    to
    6d3f5ca      
    Compare
  
    | Thanks to @cmb69 for the review. I may fuzz just a bit more, but in the meantime, review from others (if available) would be appreciated. | 
| I have done some more fuzzing; didn't find any bugs. | 
| If no-one else wants to review, I will merge tomorrow. | 
| static void mb_wchar_to_cp50222(uint32_t *in, size_t len, mb_convert_buf *buf, bool end); | ||
|  | ||
| /* See mbstring.c */ | ||
| uint32_t mb_convert_kana_codepoint(uint32_t c, uint32_t next, bool *consumed, uint32_t *second, int mode); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is introducing a reverse-dependency from libmbfl to mbstring. Probably the function should continue to live in libmbfl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, indeed. @alexdowad, can you please address this? If mb_convert_kana_codepoint() can be easily changed to be moved to libmbfl, an option might be to establish a callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do (in my next PR, which is close to ready).
Now
mb_convert_kanahas been moved over to using the new text conversion filters.Fuzzing with a modified
sapi/fuzzer/php-fuzz-mbstringwas used to ensure there is no change from the old behavior (except for the fix for U+0000). A few known bugs in the old encoding conversion filters caused spurious fuzzer crashes; fixes for the old encoding conversion filters so as to suppress those as I continue moving forward.FYA @cmb69 @nikic @kamil-tekiela