-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Optimize CheckIriUnicodeRange #31860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize CheckIriUnicodeRange #31860
Conversation
Interesting, LLVM managed to vectorize a similar code (in C++): https://godbolt.org/z/oetv_u 🙂 (just saying) |
The majority of the clauses in the // This method implements the ABNF checks per https://tools.ietf.org/html/rfc3987#section-2.2
internal static bool CheckIriUnicodeRange(char highSurr, char lowSurr, ref bool surrogatePair, bool isQuery)
{
bool inRange = false;
surrogatePair = false;
Debug.Assert(char.IsHighSurrogate(highSurr));
if (Rune.TryCreate(highSurr, lowSurr, out Rune rune))
{
surrogatePair = true;
// U+xxFFFE..U+xxFFFF is always private use for all planes, so we exclude it.
// U+E0000..U+E0FFF is disallowed per the 'ucschar' definition in the ABNF.
// U+F0000 and above are only allowed for 'iprivate' per the ABNF (isQuery = true).
inRange = ((ushort)rune.Value < 0xFFFE)
&& ((uint)(rune.Value - 0xE0000) >= (uint)(0xE1000 - 0xE0000))
&& (isQuery || rune.Value < 0xF0000);
}
return inRange;
} |
Should Regarding ranges, can you comment on the non-surrogate-pair version of CheckIriUnicodeRange. |
The only real difference is that the |
Co-Authored-By: Stephen Toub <stoub@microsoft.com>
a832074
to
7b1917a
Compare
I used the optimization for similar range checks @GrabYourPitchforks suggested, only changing the cast to ushort to AND 0xFFFF. I verified that the behaviour for all inputs is the same. |
Avoid using an intermediate string to do range comparisons, behavior remains the same for all inputs.
Perf for that method alone is ~10-15x,
Perf for
"scheme:" + { '\ud83f', '\udffe' } * 1000