Description
The "quantifier merge transform" used by the optimizer is sometimes incorrect: a regex can be rewritten to another regex that is not equivalent.
- It it sometimes incorrect to rewrite
r{0,m}r{n}
intor{n,m+n}
.
Counter-example:
/(?:a|ab){0,1}(?:a|ab){1}/
is not equivalent to /(?:a|ab){1,2}/
: they match different things on the string "aba"
.
- For the same reason, it is incorrect to rewrite
r*r{n}
intor{n,}
.
Counter-example:
/(?:a|ab)*(?:a|ab){1}/
is not equivalent to /(?:a|ab){1,}/
: they match different things on the string "aba"
.
- It is sometimes incorrect to rewrite
r{0,n}?r{0,m}?
intor{0,n+m}?
.
Counter-example:
/(?:ab|aba){0,1}?(?:ab|aba){0,1}?[bc]/
is not equivalent to /(?:ab|aba){0,2}?[bc]/
: they match different things on the string "ababc".
However, regexp-tree's optimizer module performs these transformations.
Thanks to Eugène Flesselle @EugeneFlesselle for finding these counter-examples while working on proving the correctness of regex transformations.
We believe that these transformations are however correct when r
is unambiguous, meaning that for each string, there is only one way of matching /^r/
.
Maybe regexp-tree could check the ambiguity of the regex before applying these transformations.