Open
Description
Description
I believe Regexes should function on Unicode scalars, not on Swift Chars. This is a failure mode: <space>+<combining mark>
(such as " ̃"
) is seen as a single whitespace character, where all other programming languages I know of regard it conceptually as a single whitespace character plus a single non-spacing combining character.
Reproduction
let aTilde = "ã" // \u{0061} + \u{0303}
let aMatch = try! /\S/.firstMatch(in: aTilde)
print(aMatch?.output) // "ã" hm... I would have expected only the scalar 'a' to match
let combiningTilde = "̃" // \u{0303}
let tildeMatch = try! /\S/.firstMatch(in: combiningTilde)
print(tildeMatch?.output) // "̃" correct to me
let spaceWithTilde = " ̃" // space+tilde
let spaceTildeMatch = try! /\S/.firstMatch(in: spaceWithTilde)
print(spaceTildeMatch?.output) // nil, but I would expect \u{0303} to match
Expected behavior
tilde scalar was expected to match regex, since it is not a whitespace codepoint (WS) according to Unicode specification, but non-spacing (Mn)
Environment
5.9
Additional information
No response