Skip to content

<regex>: Character ranges with collating symbol bounds are rejected #5391

Open
@muellerj2

Description

@muellerj2

The regex parser currently rejects ranges in character classes the bounds of which are given by collating symbols.

Test case

#include <algorithm>
#include <locale>
#include <regex>
#include <string>

using namespace std;

class collating_symbols_regex_traits : public regex_traits<char> {

public:
    template <class FwdIt>
    string_type lookup_collatename(FwdIt first, FwdIt last) const {
        // from Hungarian
        const string_type collating_symbol1 = "cs";
        const string_type collating_symbol2 = "dzs";

        if (std::equal(first, last, begin(collating_symbol1), end(collating_symbol1))) {
            return collating_symbol1;
        }

        if (std::equal(first, last, begin(collating_symbol2), end(collating_symbol2))) {
            return collating_symbol2;
        }

        return regex_traits::lookup_collatename(first, last);
    }
};

int main() {
    basic_regex<char, collating_symbols_regex_traits> r;
    r.imbue(locale("hu_HU"));
    try {
        r.assign("[[.cs.]-[.dzs.]]");
        printf("construction succeeded");
    } catch (const regex_error& e) {
        printf("regex error thrown");
    }
    return 0;
}

No Godbolt link: This succeeds there because the latest provided MSVC STL doesn't include the fix for #4995 yet.

The construction of the regex object should succeed, but this actually throws a regex_error with code error_range.

Note: Fixing this for multi-character collating symbols requires changes to the layout of the NFA node that represents character classes. I think this is doable without breaking ABI, but the fix has to be carefully thought through.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregexmeow is a substring of homeowner

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions