Skip to content

libc++ regex drops the character between a zero-length match and its subsequent match #64451

Closed
@rprichard

Description

@rprichard

With libc++, when std::regex_iterator advances past a zero-length match (e.g. "", "^\\s*|\\s*$"), it skips the character immediately after the zero-length match. As a result, std::regex_replace also drops that character.

Example:

#include <cstdio>
#include <regex>
int main() {
    std::string src = "AB";
    std::regex pattern("");

    // Expected 'xAxBx'. Actual output is 'xxx'.
    std::string repl = std::regex_replace(src, pattern, "x");
    printf("'%s'\n", repl.c_str());

    // Expected ['', 'A', 'B']. Actual output is ['', '', ''].
    std::sregex_iterator begin { src.begin(), src.end(), pattern };
    std::sregex_iterator end {};
    for (auto i = begin; i != end; ++i) {
        std::smatch m = *i;
        printf("'%s'\n", m.prefix().str().c_str());
    }

    return 0;
}

libstdc++ prints the expected output above.

This bug was originally reported against the Android NDK, android/ndk#1911, where the pattern was "^\\s*|\\s*$"

I wonder if operator++ need to adjust the prefix backward one character when it skips one character forward here:

https://github.com/llvm/llvm-project/blob/llvmorg-17.0.0-rc1/libcxx/include/regex#L6512

Metadata

Metadata

Assignees

No one assigned

    Labels

    libc++libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.regexIssues related to regex

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions