Closed
Description
With libc++, when std::regex_iterator advances past a zero-length match (e.g. ""
, "^\\s*|\\s*$"
), it skips the character immediately after the zero-length match. As a result, std::regex_replace also drops that character.
Example:
#include <cstdio>
#include <regex>
int main() {
std::string src = "AB";
std::regex pattern("");
// Expected 'xAxBx'. Actual output is 'xxx'.
std::string repl = std::regex_replace(src, pattern, "x");
printf("'%s'\n", repl.c_str());
// Expected ['', 'A', 'B']. Actual output is ['', '', ''].
std::sregex_iterator begin { src.begin(), src.end(), pattern };
std::sregex_iterator end {};
for (auto i = begin; i != end; ++i) {
std::smatch m = *i;
printf("'%s'\n", m.prefix().str().c_str());
}
return 0;
}
libstdc++ prints the expected output above.
This bug was originally reported against the Android NDK, android/ndk#1911, where the pattern was "^\\s*|\\s*$"
I wonder if operator++ need to adjust the prefix backward one character when it skips one character forward here:
https://github.com/llvm/llvm-project/blob/llvmorg-17.0.0-rc1/libcxx/include/regex#L6512