-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
Console.WriteLine(Regex.IsMatch("\xF7", @"^(?i:[\xD7\xD8])$", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant));
Console.WriteLine(Regex.IsMatch("\xF7", @"^(?i:[\xD7-\xD8])$", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant));
}
}The above two patterns should be identical, with on character set containing \xD7 and \xD8 and the other containing the range from \xD7 through \xD8 (which is just \xD7 and \xD8, since there's nothing in between them).
However, the first correctly prints false whereas the second incorrectly prints true.
The implementation handles casing by creating a character class that's the lowercased version of the original. That means that for individual characters, it just adds the lowercase character:
Lines 554 to 555 in fd82afe
| char lower = culture.TextInfo.ToLower(range.First); | |
| rangeList[i] = new SingleRange(lower, lower); |
and for ranges it needs to add the lowercase character for each character in the range:
Line 559 in fd82afe
| AddLowercaseRange(range.First, range.Last); |
In the first case above, it follows the first path, adding in the ToLower(\xD7) (which is just \xD7) and the ToLower(\xD8) (which is \xF8).
In the second case, however, it follows the second path, and ends up incorrectly adding a range from \xF7 through \xF8.
As a result, the second case ends up incorrectly matching \xF7.