Skip to content

CURLYX v CURLYM disagreement #19615

Open
@hvds

Description

@hvds

As part of an unrelated experiment I tried disabling the CURLYX -> CURLYM regexp optimization with the following patch:

diff --git a/regcomp.c b/regcomp.c
index 169272e1dd..2dac74fecd 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -5807,7 +5807,7 @@ S_study_chunk(pTHX_
               nogo:
 
                 /* Try optimization CURLYX => CURLYM. */
-                if (  OP(oscan) == CURLYX && data
+                if (0 && OP(oscan) == CURLYX && data
                       && !(data->flags & SF_HAS_PAR)
                       && !(data->flags & SF_HAS_EVAL)
                       && !deltanext    /* atom is fixed width */

I expected this to disable a pure optimization, but was surprised to find that with this change we get some test failures.

The first failure is t/re/pat.t test 104 "monster, length = 300000 at re/pat.t line 373" with the pattern /(?^:b(?:a|b)+=)/; I think this one is understandable if CURLYM can match more repeats than CURLYX can (though I hope we can lift the restriction some day).

The remainder are three tests (two newly added for the *ACCEPT work) failing in each of t/re/regexp.t and its variants. I think these show that CURLYX and CURLYM disagree on how captures inside a quantifier should be filled (which I think we also recently discussed). I infer that one or other case has a bug:

not ok 966 () ^(aa(bb)?)+$:aabbaa:y:-$1-$2-:-aa-- => '-aa-bb-', match=1
# $subject = "aabbaa";
# 
#                 
#                 $match = ($subject =~ m'^(aa(bb)?)+$') while $c--;
#                 $got = "-$1-$2-";
# 
# $got = "-aa-bb-";
# 
# $expected = "-aa--";

not ok 2080 - ACCEPT with CurlyM optimization GH #19484 () /(A(A|B(*ACCEPT)|C)+D)(E)/:ABDE:y:$1-$2:AB-B => '-B', match=1
# $subject = "ABDE";
# 
#                 
#                 $match = ($subject =~ m/(A(A|B(*ACCEPT)|C)+D)(E)/) while $c--;
#                 $got = "$1-$2";
# 
# $got = "-B";
# 
# $expected = "AB-B";

not ok 2082 - ACCEPT with CurlyM optimization GH #19484 () /(A(A|B(*ACCEPT)|C)+D)(E)/:ABCDE:y:$1-$2:AB-B => '-B', match=1
# $subject = "ABCDE";
# 
#                 
#                 $match = ($subject =~ m/(A(A|B(*ACCEPT)|C)+D)(E)/) while $c--;
#                 $got = "$1-$2";
# 
# $got = "-B";
# 
# $expected = "AB-B";

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions