Skip to content

Conversation

@richardleach
Copy link
Contributor

A U8* structure is used to track character positions during Aho-Corasick string searching. This used to always be allocated from the heap, and wrapped in a mortal SV to avoid leakage. However, that incurs overhead.

Following this commit:

  • A stack buffer is used if maxlen is small enough.
  • Otherwise, the heap allocation is saved directly to the savestack for freeing during stack unwinding.
  • Since a mortal SV is no longer used, there is no need to SAVETMPS and FREETMPS at scope entry/exit.

./perl -e 'my $x = "x" x 100_000_000; 1 while $x =~ /xx|ab/aag;' was used to measure performance.

Blead:

          6,620.36 msec task-clock                       #    0.999 CPUs utilized             
    28,654,412,324      cycles                           #    4.328 GHz                       
       911,721,108      stalled-cycles-frontend          #    3.18% frontend cycles idle      
   108,774,373,701      instructions                     #    3.80  insn per cycle            
    22,155,300,318      branches                         #    3.347 G/sec

sv_points -> points_heap:

          5,980.47 msec task-clock                       #    0.999 CPUs utilized             
    26,034,479,632      cycles                           #    4.353 GHz                       
       268,382,711      stalled-cycles-frontend          #    1.03% frontend cycles idle      
   101,423,701,030      instructions                     #    3.90  insn per cycle            
    20,205,140,413      branches                         #    3.379 G/sec

Add points_stack:

          5,498.05 msec task-clock                       #    0.999 CPUs utilized             
    24,193,996,840      cycles                           #    4.400 GHz                       
        82,342,305      stalled-cycles-frontend          #    0.34% frontend cycles idle      
    91,772,992,857      instructions                     #    3.79  insn per cycle            
    18,054,971,288      branches                         #    3.284 G/sec

Mandatory -> conditional ENTER/LEAVE:

          5,141.42 msec task-clock                       #    0.999 CPUs utilized             
    23,185,766,525      cycles                           #    4.510 GHz                       
       131,930,496      stalled-cycles-frontend          #    0.57% frontend cycles idle      
    86,972,184,973      instructions                     #    3.75  insn per cycle            
    17,254,785,144      branches                         #    3.356 G/sec

I'm not sure what the optimum stack size should be. For reference, these were the
maxlen values observed and the counts of each when building Perl and running
the test harness:

          '2' => 35534,
          '1' => 14188,
          '8' => 8568,
          '5' => 3271,
          '6' => 2996,
          '3' => 641,
          '12' => 408,
          '7' => 358,
          '4' => 241,
          '10' => 172,
          '22' => 147,
          '16' => 67,
          '9' => 10,
          '31' => 5,
          '11' => 3,
          '35' => 3
          '47' => 1,
          '33' => 1,
          '18' => 1,

  • This set of changes may require a perldelta entry and I can add one if so.

A `U8*` structure is used to track character positions during Aho-Corasick
string searching. This used to always be allocated from the heap, and
wrapped in a mortal SV to avoid leakage. However, that incurs overhead.

Following this commit:
* A stack buffer is used if `maxlen` is small enough.
* Otherwise, the heap allocation is saved directly to the savestack
for freeing during stack unwinding.
* Since a mortal SV is no longer used, there is no need to `SAVETMPS`
and `FREETMPS` at scope entry/exit.
@richardleach richardleach merged commit 1a871ce into Perl:blead Oct 20, 2025
34 checks passed
@richardleach richardleach deleted the AHOCORASICK_points_stack branch October 20, 2025 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants