Skip to content

Question: Do PCRE2 leftmost-first semantics include capture group positions? #1336

@wareya

Description

@wareya

What version of regex are you using?

1.12.x and also older

Describe the bug at a high level.

See title: "Question: Do PCRE2 leftmost-first semantics include capture groups?"

I've been testing regex implementations for differences in capture behavior because I'm trying to figure out how to best handle tie-breaking in a lockstep parallel NFA simulation. I'm running into some strange differences from PCRE2 in automata-driven crates and can't figure out if they would be considered bugs worth reporting or not. If not then I want to avoid dropping a ton of supposed bugs on here for no reason. I vaguely remember from working on my own regex implementation that handling quantified nullable groups was a headache even in backtracking land.

Random example: On the regex (|.)*(a+b) (yes really) with the input axaaab, everything successfully matches the entire input string, but rust/regex and re2 give capture groups of axa[a][ab], while PCRE2 and C# etc give ax[][aaab].

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions