Open
Description
The regex parser currently rejects ranges in character classes the bounds of which are given by collating symbols.
Test case
#include <algorithm>
#include <locale>
#include <regex>
#include <string>
using namespace std;
class collating_symbols_regex_traits : public regex_traits<char> {
public:
template <class FwdIt>
string_type lookup_collatename(FwdIt first, FwdIt last) const {
// from Hungarian
const string_type collating_symbol1 = "cs";
const string_type collating_symbol2 = "dzs";
if (std::equal(first, last, begin(collating_symbol1), end(collating_symbol1))) {
return collating_symbol1;
}
if (std::equal(first, last, begin(collating_symbol2), end(collating_symbol2))) {
return collating_symbol2;
}
return regex_traits::lookup_collatename(first, last);
}
};
int main() {
basic_regex<char, collating_symbols_regex_traits> r;
r.imbue(locale("hu_HU"));
try {
r.assign("[[.cs.]-[.dzs.]]");
printf("construction succeeded");
} catch (const regex_error& e) {
printf("regex error thrown");
}
return 0;
}
No Godbolt link: This succeeds there because the latest provided MSVC STL doesn't include the fix for #4995 yet.
The construction of the regex object should succeed, but this actually throws a regex_error
with code error_range
.
Note: Fixing this for multi-character collating symbols requires changes to the layout of the NFA node that represents character classes. I think this is doable without breaking ABI, but the fix has to be carefully thought through.