Skip to content

String comparer for sorting numeric strings logically #13979

Closed
@Peter-Juhasz

Description

@Peter-Juhasz

Rationale

For sorting purposes it's common to need portions of strings containing numbers to be treated like numbers. Consider the list of strings "Windows 7", "Windows 10".

Using the Ordinal StringComparer to sort the list one would get

Windows 10
Windows 7

but the desired ascending logical sort would be

Windows 7
Windows 10

Proposed API

 namespace System {
     public class StringComparer {
+        public static StringComparer Create(CultureInfo culture, CompareOptions options);
     }
 }
 namespace System.Globalization {
     public enum CompareOptions {
+        NumericOrdering = 0x00000020
     }
 }

Usage

var list = new List<string> { "Windows 10", "Windows 7" };
list.Sort(StringComparer.Logical); // List is now "Windows 7", "Windows 10"

This would also be good for sorting strings containing IP addresses.

Details

  • Logical is a convenience property equivalent to the result of Create(CultureInfo.CurrentCulture, CompareOptions.Logical)
  • LogicalIgnoreCase is a convenience property equivalent to the result of Create(CultureInfo.CurrentCulture, CompareOptions.Logical | CompareOptions.IgnoreCase)
  • Non-numeric sequences will be evaluated with the culture provided.
  • Numeric sequences will be determined by the result of Char.IsDigit.
  • All UTF-16 digits will be supported and are manually parsed using Char.GetNumericValue.
  • Only positive integral values without digit separators will be supported directly.
  • Numbers will be treated as ulongs. Logic for overflows will have to be considered.
  • The string Windows 8.1 would be considered 4 sequences. The Windows would be a string sequence, the 8 would be a numeric sequence, the . would be another string sequence, and the 1 would be another numeric sequence.
  • This API could later be expanded to include support for allowing signs, decimals, and digit separators through the use of overloads accepting a NumberStyles parameter.
  • When a numeric and string sequence are considered at the same time the numeric sequence always comes before the string sequence so when sorting the following list, "a", "7" the number 7 will be sorted before the letter a.
  • Existing methods that take a CompareOptions parameter as input will need to be updated to support the new Logical member.

Open Questions

  • Should CompareOptions.Logical be implemented as the flag option SORT_DIGITSASNUMBERS to the dwCmpFlags parameter of CompareStringEx? Using it's implementation should be more efficient but later expanding support for NumberStyles will require a re-implementation with matching behavior.

Updates

  • Added Logical and LogicalIgnoreCase properties.
  • Added support for all UTF-16 digits.
  • Added more CreateLogical overloads to match the Create method.
  • Added retrieval of the NumberFormatInfo from the StringComparer parameter when not explicitly provided and is a CultureAwareComparer.
  • Removed CreateLogical overloads that matched the Create method.
  • Switched to only supporting positive integral values without digit separators.
  • Added consideration of comparing a numeric sequence with a string sequence.
  • Added the flag member CompareOptions.Logical and changed CreateLogical to be just an overload of Create.

Metadata

Metadata

Labels

api-approvedAPI was approved in API review, it can be implementedarea-System.Runtimehelp wanted[up-for-grabs] Good issue for external contributorsin-prThere is an active PR which will close this issue when it is merged

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions