Skip to content

Conversation

@GrabYourPitchforks
Copy link
Member

The method SortKey.Compare contains an early incorrect exit which can cause it to return 0 if all of the below conditions hold:

  • The application is using the invariant globalization mode, and
  • Neither sortKey1 nor sortKey2 represents the empty string, and
  • sortKey1 is a prefix of sortKey2 (or vice versa).

In a nutshell, this means that under the invariant globalization mode, the expression compareInfo.GetSortKey("he").Equals(compareInfo.GetSortKey("hello!")) incorrectly returns true.

There may also be some inputs under the non-invariant mode which trigger this incorrect output, but I didn't attempt this since it would be highly dependent on how the active version of NLS or ICU produces sort keys. It's far easier to trigger this bug under the invariant mode.

This PR addresses that issue in SortKey.Compare. Additionally, there are some performance optimizations made to SortKey.Equals and SortKey.GetHashCode. This also builds on #31761 by sealing the SortKey type and devirtualizing the methods.

@GrabYourPitchforks
Copy link
Member Author

There's another interesting behavior here. Since under the invariant mode sort keys are a raw projection of chars to bytes, there are endianness issues to consider.

For example, under the invariant mode:

/* on a little-endian machine */

// "ĕ" = "\u0115"
string.Compare("e", "ĕ", StringComparison.Ordinal); // returns < 0
CompareInfo.Compare("e", "ĕ"); // returns < 0
SortKey.Compare(CompareInfo.GetSortKey("e"), CompareInfo.GetSortKey("ĕ")); // returns > 0

/* on a big-endian machine */

string.Compare("e", "ĕ", StringComparison.Ordinal); // returns < 0
CompareInfo.Compare("e", "ĕ"); // returns < 0
SortKey.Compare(CompareInfo.GetSortKey("e"), CompareInfo.GetSortKey("ĕ")); // returns < 0

Open question: Is this discrepancy allowable? I don't know if there's a valid scenario in which somebody uses GetSortKey under the invariant mode. But if there is, it might be necessary to make CompareInfo.Compare and SortKey.Compare agree on the result.

@GrabYourPitchforks GrabYourPitchforks added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 5, 2020
@tarekgh
Copy link
Member

tarekgh commented Feb 5, 2020

I wouldn't care much if someone using InvarantMode and Sort Keys. sort keys are really useful for scenarios that use real cultures and cares about linguistic behavior. The invariant mode is not for such scenarios at all.

@GrabYourPitchforks
Copy link
Member Author

Would it make sense for us to block the GetSortKey API under the invariant mode?

@GrabYourPitchforks
Copy link
Member Author

Another possible solution (if we do care about CompareInfo and SortKey not getting out of sync on the same box) is to have logic akin to this in SortKey.Compare:

if (GlobalizationMode.Invariant)
{
    return key1._bytes.AsChars().SequenceCompareTo(key2._bytes.AsChars());
}
else
{
    return key1._bytes.SequenceCompareTo(key2._bytes);
}

@jkotas
Copy link
Member

jkotas commented Feb 5, 2020

there are endianness issues to consider.

Would it make sense to swap the bytes when creating the sort key in InvariantMode to fix this?

GlobalizationMode.Invariant mode as it exists today is not very useful unless you are building a microservices that does not ever use any globalization APIs. I do not think we need to worry about what SortKeys do in it too much. I am wondering whether we can make GlobalizationMode.Invariant to be more useful by making it to be roughly on par with the globalization stack that came with classic Mono. The data for it were still small, but it was actually reasonably useful. Some of this overlaps with your idea to include Unicode data with the runtime.

@GrabYourPitchforks
Copy link
Member Author

I do not think we need to worry about what SortKeys do in it too much.

Sounds like that's two votes for "don't bother with endianness concerns"? :)

@jkotas
Copy link
Member

jkotas commented Feb 5, 2020

That, or if you would like to write the few lines of code to swap endianess in the invariantmode sort key generation - that's fine too.

@pentp
Copy link
Contributor

pentp commented Feb 5, 2020

Making GlobalizationMode.Invariant more useful/practical sounds like a good idea. I've tried it on a (micro)service that should only use InvariantCulture (or en-US), but changing to GlobalizationMode.Invariant is still a bit scary because of its quirks and it's probably not a tested scenario for Orleans or AspNetCore or Newtonsoft.Json...

@GrabYourPitchforks
Copy link
Member Author

Latest iteration:

  • Rename internal method InvariantToUpper to InvariantCaseFold so that it can be easily updated in the future
  • Remove ValidHashCodeOfStringMaskOffFlags and ValidSortkeyCtorMaskOffFlags and fold them into ValidCompareMaskOffFlags, since Compare / GetHashCode / GetSortKey should all accept the same flags
  • Simplify SortKey.Compare per PR feedback
  • Update InvariantCreateSortKey to be endianness-aware and use safer span-based logic in lieu of pointers

@GrabYourPitchforks GrabYourPitchforks removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 5, 2020
@tarekgh
Copy link
Member

tarekgh commented Feb 5, 2020

Remove ValidHashCodeOfStringMaskOffFlags and ValidSortkeyCtorMaskOffFlags and fold them into ValidCompareMaskOffFlags, since Compare / GetHashCode / GetSortKey should all accept the same flags

GetHashCode doesn't allow CompareOptions.StringSort I think.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Feb 5, 2020

GetHashCode doesn't allow CompareOptions.StringSort I think.

It has to. GetHashCode is internally built on top of GetSortKey, which means that it has to allow the same set of parameters.

This same logic applies to Compare. In practice, Compare can be thought of as a glorified "get sort key for A" and "get sort key for B", then perform a byte-wise comparison of the two sort keys.

@GrabYourPitchforks
Copy link
Member Author

Latest iteration:

  • Suppress one test case on Win7, as LCMapStringEx and CompareStringEx return different results and it's throwing off the unit test. This is only cropping up now because I modified the SortKey.Compare unit test to also ensure that CompareInfo.Compare returns the same result.

@GrabYourPitchforks GrabYourPitchforks merged commit 6766eb7 into dotnet:master Feb 6, 2020
@GrabYourPitchforks GrabYourPitchforks deleted the sortkey branch February 6, 2020 04:49
@ghost ghost added the will_lock_this label Dec 6, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants