Skip to content

Moving re_intuit_start into regex seems to be a regression #18566

Open
@demerphq

Description

@demerphq

Description
In 7fadf4a we moved the intuit call out of pp_match into regexec.

This seems to be at least partly a performance regression, as it made matches that started with \G start doing to floating mandatory string checks as well, which in general is counter productive. A publicly reported example is here: https://www.perlmonks.org/?node_id=11128141

It is also possibly an API regression as it leaves the regex engine API call for CALLREG_INTUIT_START uncalled after the change.

To be honest, the changes makes some structural sense to me, so maybe it is for the best. But I think we need to work on how this change plays with \G and start classes better. My understanding of the way that re_intuit_start() and regexec() were set up was partly to deal with \G and /gc matches and to keep them fast. The patch mentioned above changes the order somewhat so that various things related to \G happen before we hit re_intuit_start.

At a deeper level it comes down to a trade off, is it better to find something at the end of the string before we check if the thing that must be present at the start of the string is there? IMO most of the time yes, but with \G maybe not, especially not the earliest position the mandatory string could occur at is index 0. Anytime we match with FBM matching we are going to do extra work, it is only worth doing when on average it saves us from trying a regex that can't succeed, A \G anchored pattern is almost always parsing through the pattern, and so i think the floating mandatory string check might not be as helpful, if we are going to fail we probably will fail earlier with \G. Not always of course.

Steps to Reproduce

The following reproduces the problem in a nutshell:

perl -Mre=debug -e'"module ( blah );"=~/\G \s* ^ \s* module \s+ (\S+?) \s* ( \s* (.?) \s ) \s* ;/gcmsx'

https://gist.github.com/wolfsage/e24135148d901efdd6757c48955f889a shows the diff:

+Found floating substr "module" at offset 0...
+start_shift: 0 check_at: 0 s: 0 endpos: 1 checked_upto: 0
+Does not contradict STCLASS...
+Guessed: match at offset 0

Which demonstrates that this patch changed how match in this case.

Also the "old" behavior and performance can be restored by making the patch case insensitive and thus disabling the fixed string checks.

I think when \G is present we should bypass floating string checks. Generally \G is used to incrementally parse a stream, I think we should treat it as a hint that looking too far forward isn't helpful.

A good reproduction procedure is here:

https://www.perlmonks.org/?node_id=11128192

Expected behavior

We probably should not do the floating string check in the face of a \G - this is somewhat debatable, as there are cases where doing such a check is going to shave a lot of time, but most times \G is used in a way where it probably doesn't help that much.

Perl configuration

Summary of my perl5 (revision 5 version 33 subversion 7) configuration:
  Derived from: a5d5b4db3f3617b82434ee704fc42b49388f3ae0
  Platform:
    osname=linux
    osvers=4.15.0-135-generic
    archname=x86_64-linux-thread-multi
    uname='linux psykopsis 4.15.0-135-generic #139-ubuntu smp mon jan 18 17:38:24 utc 2021 x86_64 x86_64 x86_64 gnulinux '
    config_args='-Dusethreads -Doptimize=-g -d -Dusedevel -Dcc=ccache gcc -Dld=gcc -DDEBUGGING'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='gcc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-g'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fwrapv -DDEBUGGING -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='7.5.0'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='gcc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/7/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.27.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.27'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -g -L/usr/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl): 
  Compile-time options:
    DEBUGGING
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_IMPLICIT_CONTEXT
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    PERL_TRACK_MEMPOOL
    PERL_USE_DEVEL
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
    USE_THREAD_SAFE_LOCALE
  Locally applied patches:
    uncommitted-changes
  Built under linux
  Compiled at Feb 12 2021 21:47:10
  %ENV:
    PERLBREW_CONFIGURE_FLAGS="-de -Dcc=ccache\ gcc -Dld=gcc"
    PERLBREW_HOME="/home/yorton/.perlbrew"
    PERLBREW_MANPATH="/home/yorton/perl5/perlbrew/perls/perl-5.24.1/man"
    PERLBREW_PATH="/home/yorton/perl5/perlbrew/bin:/home/yorton/perl5/perlbrew/perls/perl-5.24.1/bin"
    PERLBREW_PERL="perl-5.24.1"
    PERLBREW_ROOT="/home/yorton/perl5/perlbrew"
    PERLBREW_SHELLRC_VERSION="0.88"
    PERLBREW_VERSION="0.88"
  @INC:
    lib
    /usr/local/lib/perl5/site_perl/5.33.7/x86_64-linux-thread-multi
    /usr/local/lib/perl5/site_perl/5.33.7
    /usr/local/lib/perl5/5.33.7/x86_64-linux-thread-multi
    /usr/local/lib/perl5/5.33.7

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions