Check subject UTF8 validity just once for String#gsub
, #scan
, #split
#13406
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As of #13313, regex matching uses neither
NO_UTF_CHECK
norMATCH_INVALID_UTF
, which means the subject string is checked for UTF-8 validity once per match.String#gsub
,#scan
, and#split
are special in that the regex doesn't change and is expected to be matched many times, so for these methods we should only need to do this check exactly once; if the first match does succeed, we know that the subject string is validly encoded and we can applyNO_UTF_CHECK
in subsequent matches, regardless of whether other optimizations are in place.Benchmarks:
If #13353 is merged then
options |= :no_utf_check
would apply to the parameters of the respective methods themselves.