Skip to content

Ability to query minimum and maxium length of regular expression #112386

Open
@MegaIng

Description

@MegaIng

Feature or enhancement

Proposal:

For the lark parsing library we currently use the private re._parser module, as noticed when reorganizing the relevant libraries in #91308. The only information we need is the minimum and maximum width of a match a pattern can have.

My suggestion is to add relevant attributes/properties to the Pattern class, for example as with the names min_width and max_width. max_width could be either None or MAXREPEAT (the constant from re._constants/_sre) when the pattern could match an (essentially) unlimited amount of text.

pattern = re.compile(r"abc?d?e")
assert (pattern.min_width, pattern.max_width) == (3, 5)

pattern = re.compile(r"(a*b+){2, 5}")

assert (pattern.min_width, pattern.max_width) == (2, None)

As an alternative, the re._* modules could be made a public and stable API, although this doesn't appear to be a well liked option from my reading of the above linked PR. I would like this, primarily for implementing custom regex analyzers (there a few such users of the re._parser module out there), but I think this would have to be a PEP.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

I don't think this is a major enough feature to require widespread discussion. I requested a similar feature in the third party regex library. Preferably ofcourse both would have the same interface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtopic-regextype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions