Description
Feature or enhancement
Proposal:
For the lark parsing library we currently use the private re._parser
module, as noticed when reorganizing the relevant libraries in #91308. The only information we need is the minimum and maximum width of a match a pattern can have.
My suggestion is to add relevant attributes/properties to the Pattern
class, for example as with the names min_width
and max_width
. max_width
could be either None
or MAXREPEAT
(the constant from re._constants
/_sre
) when the pattern could match an (essentially) unlimited amount of text.
pattern = re.compile(r"abc?d?e")
assert (pattern.min_width, pattern.max_width) == (3, 5)
pattern = re.compile(r"(a*b+){2, 5}")
assert (pattern.min_width, pattern.max_width) == (2, None)
As an alternative, the re._*
modules could be made a public and stable API, although this doesn't appear to be a well liked option from my reading of the above linked PR. I would like this, primarily for implementing custom regex analyzers (there a few such users of the re._parser
module out there), but I think this would have to be a PEP.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
I don't think this is a major enough feature to require widespread discussion. I requested a similar feature in the third party regex
library. Preferably ofcourse both would have the same interface.