Skip to content

Regex does not match isolated combining mark as whitespace if preceded by whitespace #724

Open
@digitalheir

Description

@digitalheir

Description

I believe Regexes should function on Unicode scalars, not on Swift Chars. This is a failure mode: <space>+<combining mark> (such as " ̃") is seen as a single whitespace character, where all other programming languages I know of regard it conceptually as a single whitespace character plus a single non-spacing combining character.

Reproduction

let aTilde = "" // \u{0061} + \u{0303}
let aMatch = try! /\S/.firstMatch(in: aTilde) 
print(aMatch?.output) // "ã" hm... I would have expected only the scalar 'a' to match
let combiningTilde = "̃" // \u{0303}
let tildeMatch = try! /\S/.firstMatch(in: combiningTilde)
print(tildeMatch?.output) // "̃" correct to me
let spaceWithTilde = " ̃" // space+tilde
let spaceTildeMatch = try! /\S/.firstMatch(in: spaceWithTilde)
print(spaceTildeMatch?.output) // nil, but I would expect \u{0303} to match

Expected behavior

tilde scalar was expected to match regex, since it is not a whitespace codepoint (WS) according to Unicode specification, but non-spacing (Mn)

Environment

5.9

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions