Description
As part of an unrelated experiment I tried disabling the CURLYX -> CURLYM regexp optimization with the following patch:
diff --git a/regcomp.c b/regcomp.c
index 169272e1dd..2dac74fecd 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -5807,7 +5807,7 @@ S_study_chunk(pTHX_
nogo:
/* Try optimization CURLYX => CURLYM. */
- if ( OP(oscan) == CURLYX && data
+ if (0 && OP(oscan) == CURLYX && data
&& !(data->flags & SF_HAS_PAR)
&& !(data->flags & SF_HAS_EVAL)
&& !deltanext /* atom is fixed width */
I expected this to disable a pure optimization, but was surprised to find that with this change we get some test failures.
The first failure is t/re/pat.t test 104 "monster, length = 300000 at re/pat.t line 373" with the pattern /(?^:b(?:a|b)+=)/
; I think this one is understandable if CURLYM can match more repeats than CURLYX can (though I hope we can lift the restriction some day).
The remainder are three tests (two newly added for the *ACCEPT work) failing in each of t/re/regexp.t and its variants. I think these show that CURLYX and CURLYM disagree on how captures inside a quantifier should be filled (which I think we also recently discussed). I infer that one or other case has a bug:
not ok 966 () ^(aa(bb)?)+$:aabbaa:y:-$1-$2-:-aa-- => '-aa-bb-', match=1
# $subject = "aabbaa";
#
#
# $match = ($subject =~ m'^(aa(bb)?)+$') while $c--;
# $got = "-$1-$2-";
#
# $got = "-aa-bb-";
#
# $expected = "-aa--";
not ok 2080 - ACCEPT with CurlyM optimization GH #19484 () /(A(A|B(*ACCEPT)|C)+D)(E)/:ABDE:y:$1-$2:AB-B => '-B', match=1
# $subject = "ABDE";
#
#
# $match = ($subject =~ m/(A(A|B(*ACCEPT)|C)+D)(E)/) while $c--;
# $got = "$1-$2";
#
# $got = "-B";
#
# $expected = "AB-B";
not ok 2082 - ACCEPT with CurlyM optimization GH #19484 () /(A(A|B(*ACCEPT)|C)+D)(E)/:ABCDE:y:$1-$2:AB-B => '-B', match=1
# $subject = "ABCDE";
#
#
# $match = ($subject =~ m/(A(A|B(*ACCEPT)|C)+D)(E)/) while $c--;
# $got = "$1-$2";
#
# $got = "-B";
#
# $expected = "AB-B";