-
-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Optimize String.prototype.normalize
#2848
Conversation
Test262 conformance changes
|
Codecov Report
@@ Coverage Diff @@
## main #2848 +/- ##
==========================================
- Coverage 50.92% 50.92% -0.01%
==========================================
Files 419 419
Lines 41780 41799 +19
==========================================
+ Hits 21278 21286 +8
- Misses 20502 20513 +11
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
7d739b8
to
f5615bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the minimal data generated like this seems like a very nice solution. Looks very nice!
@jedel1043 I did not look into it much, do you think we could do the same for the |
Yep! There's the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Looks good to me! :)
bors r+ |
We currently use `unicode_normalization` to handle the `String.prototype.normalize` method. However, the crate doesn't support UTF-16 as a first class string, so we had to do some hacks by converting the valid parts of a string to UTF-8, normalizing each one, encoding back to UTF-16 and concatenating everything with the unpaired surrogates within. All of this is obviously suboptimal for performance, which is why I leveraged the `icu_normalizer`, which does support UTF-16 input, to replace our current implementation. Additionally, this allows users to override the default normalization data if the `intl` feature is enabled by providing the required data in the `BoaProvider` data provider.
Pull request successfully merged into main. Build succeeded: |
String.prototype.normalize
String.prototype.normalize
As mentioned in #2848 (comment), this uses our new default ICU4X data to replace `char::is_start` and `char::is_continue` from the `boa_unicode` crate with the [`icu_properties`](https://crates.io/crates/icu_properties) crate. Note that this doesn't deprecate `boa_unicode` yet, since that'll require some discussion about how to proceed with a now unused sub-crate.
We currently use
unicode_normalization
to handle theString.prototype.normalize
method. However, the crate doesn't support UTF-16 as a first class string, so we had to do some hacks by converting the valid parts of a string to UTF-8, normalizing each one, encoding back to UTF-16 and concatenating everything with the unpaired surrogates within. All of this is obviously suboptimal for performance, which is why I leveraged theicu_normalizer
, which does support UTF-16 input, to replace our current implementation.Additionally, this allows users to override the default normalization data if the
intl
feature is enabled by providing the required data in theBoaProvider
data provider.