Skip to content

Conversation

@richardleach
Copy link
Contributor

@richardleach richardleach commented Jan 30, 2022

When a new SV is created and upgraded to a type known at compile time,
using the general-purpose upgrade function (sv_upgrade) is clunky.
Specifically, it is too big to be inlined, contains many branches that
can logically be resolved at compile time for known start & end types,
and the lookup of the correct body_details struct may add CPU cycles.

This PR:

  • Adds a new function, sv_upgrade_fresh, to be used when
    (typically) a new SV is created (so of type SVt_NULL) and then upgraded
    to a type known at compile time.
  • It also moves some definitions out of
    sv.h and sv.c to allow the new function to reside in inline.h.
  • Makes use of the new function within sv.c

On Linux, with gcc & clang, I found that the resulting perl size is only
about 40 bytes larger than blead.

I have follow-up branches in various stages of completeness for using
the new function elsewhere in core. (e.g. more directly with newAV() and
newHV(), rather than have them use newSV_type, which still has
avoidable branches.) These branches may not be finalized in this dev cycle.

I used the following trivial benchmark as a gauge of the performance
difference, finding the patched version to be about 30% faster:
perl -e '$str="A"x64; for (0 .. 1_000_000) { @svs = split //, $str }'

perf showed these numbers for blead:

          2,588.58 msec task-clock                #    1.000 CPUs utilized
                16      context-switches          #    0.006 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               202      page-faults               #    0.078 K/sec
    10,739,524,282      cycles                    #    4.149 GHz                      (62.45%)
        17,153,403      stalled-cycles-frontend   #    0.16% frontend cycles idle     (62.46%)
     1,899,282,708      stalled-cycles-backend    #   17.68% backend cycles idle      (62.46%)
    32,565,729,103      instructions              #    3.03  insn per cycle
                                                  #    0.06  stalled cycles per insn  (62.46%)
     7,661,237,900      branches                  # 2959.624 M/sec                    (62.50%)
        11,049,986      branch-misses             #    0.14% of all branches          (62.57%)
    13,960,240,509      L1-dcache-loads           # 5393.002 M/sec                    (62.57%)
         4,171,326      L1-dcache-load-misses     #    0.03% of all L1-dcache accesses  (62.52%)

with sv_upgrade taking 25% of the run time:

  25.16%  perlblead  perlblead           [.] Perl_sv_upgrade
  13.79%  perlblead  perlblead           [.] Perl_sv_clear
  13.40%  perlblead  libc-2.31.so        [.] _int_free
   9.07%  perlblead  libc-2.31.so        [.] malloc
   6.34%  perlblead  perlblead           [.] Perl_newSVpvn_flags
   6.18%  perlblead  libc-2.31.so        [.] _int_malloc
   5.03%  perlblead  perlblead           [.] Perl_sv_setpvn_fresh.part.0
   4.33%  perlblead  perlblead           [.] Perl_sv_free2
   4.19%  perlblead  perlblead           [.] Perl_safesysmalloc
   3.90%  perlblead  perlblead           [.] Perl_av_clear
   3.34%  perlblead  libc-2.31.so        [.] cfree@GLIBC_2.2.5
   1.58%  perlblead  perlblead           [.] Perl_pp_split

perf showed these numbers for patched:

          1,867.91 msec task-clock                #    1.000 CPUs utilized
                26      context-switches          #    0.014 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               202      page-faults               #    0.108 K/sec
     7,789,631,973      cycles                    #    4.170 GHz                      (62.20%)
        31,333,062      stalled-cycles-frontend   #    0.40% frontend cycles idle     (62.42%)
       766,749,365      stalled-cycles-backend    #    9.84% backend cycles idle      (62.63%)
    27,511,372,786      instructions              #    3.53  insn per cycle
                                                  #    0.03  stalled cycles per insn  (62.74%)
     6,576,356,741      branches                  # 3520.702 M/sec                    (62.74%)
        11,050,286      branch-misses             #    0.17% of all branches          (62.63%)
    12,448,610,262      L1-dcache-loads           # 6664.458 M/sec                    (62.42%)
         1,824,265      L1-dcache-load-misses     #    0.01% of all L1-dcache accesses  (62.21%)

and other functions coming to the fore:

  16.61%  perl     perl                [.] Perl_sv_clear
  14.34%  perl     libc-2.31.so        [.] malloc
  13.69%  perl     libc-2.31.so        [.] _int_free
  10.74%  perl     perl                [.] Perl_newSVpvn_flags
   8.48%  perl     libc-2.31.so        [.] _int_malloc
   7.20%  perl     perl                [.] Perl_sv_setpvn_fresh.part.0
   7.02%  perl     perl                [.] Perl_safesysmalloc
   5.39%  perl     perl                [.] Perl_sv_free2
   5.35%  perl     perl                [.] Perl_av_clear
   3.54%  perl     libc-2.31.so        [.] cfree@GLIBC_2.2.5
   2.56%  perl     perl                [.] Perl_pp_split

I don't consider this PR fit for commit just yet, since moving parts of sv.h and sv.c into inline.h:

  1. Seems gross, but I couldn't figure out how to avoid it
  2. Leaves the big chunk of body commentary in sv.c somewhat adrift of e.g. the bodies_by_type
    table. I don't know how best to tidy that up.

Hoping for some feedback and suggestions on the above two points!

@richardleach
Copy link
Contributor Author

richardleach commented Jan 30, 2022

Also, check failures. I'll work on those as time allows.

@richardleach
Copy link
Contributor Author

Notes on Windows msvc142 fail:

  • Visual Studio 2019 Developer Command Prompt v16.11.9
	cl -c -I. -nologo -GF -W3 -MD -I..\lib\CORE -I.\include -I. -I.. -DWIN32 -D_CONSOLE -DNO_STRICT -DWIN64 -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -D_WINSOCK_DEPRECATED_NO_WARNINGS -DPERLDLL -DPERL_CORE   -O1 -Zi -GL -fp:precise  -DPERL_TEXTMODE_SCRIPTS -DMULTIPLICITY -DPERL_IMPLICIT_SYS -TP -EHsc -Foperllib.obj perllib.c
perllib.c
D:\a\perl5\perl5\lib\CORE\inline.h(791): error C7555: use of designated initializers requires at least '/std:c++20'
D:\a\perl5\perl5\lib\CORE\inline.h(791): error C7555: use of designated initializers requires at least '/std:c++20'
D:\a\perl5\perl5\lib\CORE\inline.h(793): error C4576: a parenthesized type followed by an initializer list is a non-standard explicit type conversion syntax
D:\a\perl5\perl5\lib\CORE\inline.h(798): error C7555: use of designated initializers requires at least '/std:c++20'
D:\a\perl5\perl5\lib\CORE\inline.h(798): error C7555: use of designated initializers requires at least '/std:c++20'
D:\a\perl5\perl5\lib\CORE\inline.h(802): error C4576: a parenthesized type followed by an initializer list is a non-standard explicit type conversion syntax

note that this works fine (sv.c has the exact same c99 initializers in sv_upgrade):

	cl -c -I.. -nologo -GF -W3 -MD -I..\lib\CORE -I.\include -I. -I.. -DWIN32 -D_CONSOLE -DNO_STRICT -DWIN64 -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -D_WINSOCK_DEPRECATED_NO_WARNINGS -DPERLDLL -DPERL_CORE   -O1 -Zi -GL -fp:precise  -DPERL_TEXTMODE_SCRIPTS -DMULTIPLICITY -DPERL_IMPLICIT_SYS -Fo..\sv.obj ..\sv.c
sv.c

vs working when c99 initializers were added in #19270:

  • Visual Studio 2019 Developer Command Prompt v16.11.7
	cl -c -I. -nologo -GF -W3 -MD -I..\lib\CORE -I.\include -I. -I.. -DWIN32 -D_CONSOLE -DNO_STRICT -DWIN64 -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -D_WINSOCK_DEPRECATED_NO_WARNINGS -DPERLDLL -DPERL_CORE   -O1 -Zi -GL -fp:precise  -DPERL_TEXTMODE_SCRIPTS -DMULTIPLICITY -DPERL_IMPLICIT_SYS -TP -EHsc -Foperllib.obj perllib.c
perllib.c
<no errors>

Any difference between v16.11.7 and v16.11.9? Do the flags on the failing build command line (-TP - treat source as CPP; -EHsc - exception handling settings) make a difference? Or should I just be declaring the function somewhere in a .c file?

@richardleach
Copy link
Contributor Author

Notes on mingw64 fail:

  • gcc.exe (x86_64-posix-seh, Built by strawberryperl.com project) 8.3.0
g++ -shared -o ..\perl535.dll  -s -L"c:\perl\lib\CORE" -L"C:\strawberry\c\lib" -L"C:\strawberry\c\x86_64-w64-mingw32\lib" -L"C:\strawberry\c\lib\gcc\x86_64-w64-mingw32\8.3.0" \
   ..\toke.o ..\regcomp.o ..\regexec.o ..\op.o ..\sv.o ..\pp.o ..\pp_ctl.o ..\pp_sys.o ..\pp_pack.o ..\pp_hot.o ..\gv.o ..\perl.o ..\utf8.o ..\dump.o ..\hv.o ..\av.o ..\builtin.o ..\caretx.o ..\deb.o ..\doio.o ..\doop.o ..\dquote.o ..\globals.o ..\mro_core.o ..\locale.o ..\keywords.o ..\mathoms.o ..\mg.o ..\numeric.o ..\pad.o ..\perly.o ..\pp_sort.o ..\reentr.o ..\run.o ..\scope.o ..\taint.o ..\time64.o ..\universal.o ..\util.o perllib.o ..\perlio.o .\win32.o .\win32sck.o .\win32thread.o .\fcrypt.o ..\DynaLoader.o ..\lib\auto\Win32CORE\Win32CORE.a   -lmoldname -lkernel32 -luser32 -lgdi32 -lwinspool -lcomdlg32 -ladvapi32 -lshell32 -lole32 -loleaut32 -lnetapi32 -luuid -lws2_32 -lmpr -lwinmm -lversion -lodbc32 -lodbccp32 -lcomctl32 ..\lib\CORE\perl535.exp
C:/strawberry/c/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: ..\lib\CORE\perl535.exp:fake:(.edata+0x10a8): undefined reference to `Perl_sv_upgrade_fresh'
collect2.exe: error: ld returned 1 exit status
gmake: *** [GNUMakefile:1457: ..\perl535.dll] Error 1

But the other inline functions (e.g. Perl_av_store_simple) don't have this problem. It is because their embed.fnc entries are i and this one is I?

When a new SV is created and upgraded to a type known at compile time,
using the general-purpose upgrade function (sv_upgrade) is clunky.
Specifically, it is too big to be inlined, contains many branches that
can logically be resolved at compile time for known start & end types,
and the lookup of the correct body_details struct may add CPU cycles.

This commit does two things:
* adds Perl_sv_upgrade_fresh for use with SVs of type SVt_NULL
* moves structure, macro and function definitions required to allow
  this new function to reside in inline.h

Subsequent commits will make use of this new function.
Most of the various newSV* functions modified in this commit all aquire
a new SV head and upgrade it to a specific PV type of choice. Changing
the call from sv_upgrade(sv, type) to sv_upgrade_fresh(sv,type) allows
for the inlining of only the relevant body allocation code into these
functions.

The exception to the above is Perl_newSV_type, which takes the target
type as a parameter. For some callers, the target types may not be known
until runtime. In this case, the entirety of Perl_sv_upgrade_fresh will
be inlined.

The growth in the size of the compiled perl binary is small (only about
40 bytes, at least on x64 using gcc or clang).
@richardleach
Copy link
Contributor Author

Initializers changed to fix the msvc142 build.
Forced inlining relaxed to fix mingw64 build - but I'm still unclear why this should be necessary.

=cut
*/

#define PERL_IN_SV_C
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effectively defines PERL_IN_SV_C everywhere, which defeats its purpose. If you need some symbol that isn't available otherwise, you need to make it public.

@xenu
Copy link
Member

xenu commented Feb 4, 2022

IMO it shouldn't be a public API and it belongs inside sv.c. External code should be using newSV_type or newSV(nv|pv|uv|iv).

@xenu
Copy link
Member

xenu commented Feb 4, 2022

Oh, it isn't actually marked as an API function. Still, it belongs in sv.c, it's the only place that uses it.

@richardleach
Copy link
Contributor Author

Oh, it isn't actually marked as an API function. Still, it belongs in sv.c, it's the only place that uses it.

It's the only place that uses it so far. I was going to do some follow up commits, but first see chat below.

IMO it shouldn't be a public API and it belongs inside sv.c. External code should be using newSV_type or newSV(nv|pv|uv|iv).

I would agree, but for newSV_type being so generic that it negates a chunk of the benefit.

e.g. You create an AV outside of sv.c via newAV() and despite type == SVt_PVAV basically being set in stone at the call site, each time that call is made to newSV_type:

  • The switch (new_type) branch has to be evaluated
  • More stack push/pops that are strictly necessary might happen, because the compiler has to account for all switch cases
  • Lookups to e.g. bodies_by_type might not be optimal

I would definitely be happy to leave Perl_sv_upgrade_fresh in sv.c and create some dedicated functions such as newSVav and newSVhv (or _flags variants, whatever), leaving newSV_type for those cases where the type really isn't known in advance or the code is really cold.

I just wasn't sure what the reception would be to that approach - more newSV* functions in sv.c - vs the one currently in this PR. (It seemed more useful to present a concrete approach in a PR, even to be shot down in favour of something else.)

@xenu
Copy link
Member

xenu commented Feb 4, 2022

Forced inlining relaxed to fix mingw64 build - but I'm still unclear why this should be necessary.

It's because I forgot to make makedef.pl skip FORCE_INLINE functions. It wasn't causing any problems until now because none of the FORCE_INLINE functions are publicly visible. I'll make a PR to fix it shortly.

@xenu
Copy link
Member

xenu commented Feb 4, 2022

Now that I think of it, maybe upgrade_fresh shouldn't exist at all, and instead its code should be included inside newSV_type? It seems that the overwhelming majority of callers specify the type at compile time.

@richardleach
Copy link
Contributor Author

Now that I think of it, maybe upgrade_fresh shouldn't exist at all, and instead its code should be included inside newSV_type? It seems that the overwhelming majority of callers specify the type at compile time.

But:

  1. Still my arguments above about unnecessary work being done in calls to newSV_type. While the majority of callers specify the type at compile time, that information doesn't carry across translation units and so perl still does "ooh, let me see what kind of type you want?" work on every entry to newSV_type at run time.
  2. This PR also uses sv_upgrade_fresh it in the following:
  • Perl_newSV
  • Perl_newSVpvn_flags
  • Perl_newSVpv
  • Perl_newSVpvn
  • Perl_newSVhek
  • Perl_newSVpvn_share

If there's no sv_upgrade_fresh, then either those functions don't get its performance upgrade or else the relevant code has to be manually inlined - and subsequently kept up to date with any changes to sv_upgrade - in all of them.

@xenu
Copy link
Member

xenu commented Feb 4, 2022

But:

  1. Still my arguments above about unnecessary work being done in calls to newSV_type. While the majority of callers specify the type at compile time, that information doesn't carry across translation units and so perl still does "ooh, let me see what kind of type you want?" work on every entry to newSV_type at run time.

If newSV_type is an inline function, that switch will disappear at compile time, just like it did in sv_upgrade_fresh.

  1. This PR also uses sv_upgrade_fresh it in the following:
  • Perl_newSV

  • Perl_newSVpvn_flags

  • Perl_newSVpv

  • Perl_newSVpvn

  • Perl_newSVhek

  • Perl_newSVpvn_share

If there's no sv_upgrade_fresh, then either those functions don't get its performance upgrade or else the relevant code has to be manually inlined - and subsequently kept up to date with any changes to sv_upgrade - in all of them.

All those functions are doing this:

SV *foo;
new_SV(foo);
sv_upgrade_fresh(foo, SVt_FOOBAR);

which is equivalent to

SV *foo = newSV_type(SVt_FOOBAR);

The main advantages of this approach are:

  1. It's simpler.
  2. Hopefully it will speed-up the other few dozen of places in core that call newSV_type (e.g. newAV) and also XS modules.

BTW, if you're afraid of the inlining overhead in the cases when the type can't be deduced at compile time, something like this can be done:

#define newSV_type(type) \
  __builtin_constant_p(type) ? newSV_type_inline(type) : newSV_type_not_inline(type);

TBH I'm not sure if it's worth it.

@richardleach
Copy link
Contributor Author

  1. Still my arguments above about unnecessary work being done in calls to newSV_type. While the majority of callers specify the type at compile time, that information doesn't carry across translation units and so perl still does "ooh, let me see what kind of type you want?" work on every entry to newSV_type at run time.

If newSV_type is an inline function, that switch will disappear at compile time, just like it did in sv_upgrade_fresh.

Oh, I see. I hadn't understood that you meant converting newSV_type into an inline function. I just thought you meant leaving it where it is, but putting the inlined upgrade code into it there.

Yes, I'd thought about that as a possible approach, but was worried about code bloat. It quickly became easy in my experiments to add 8kb+ to the binary size. I wasn't sure what kind of size increase would be acceptable. If that is a big enough concern, what do you think about a newSV_type_inline that does as we've discussed for hotter codepaths, but leaves the original function for colder code?

There's also still the same "problem" that the various structs and macros moves from sv.h and sv.c into inline.h would be required.

@richardleach
Copy link
Contributor Author

There's also still the same "problem" that the various structs and macros moves from sv.h and sv.c into inline.h would be required.

Oh, more than that - additional definitions would also have to move so that new_SV() can be used outside of sv.c. Maybe an entirely new header file would be the tidiest solution, rather than moving so much into inline.h?

@xenu
Copy link
Member

xenu commented Feb 4, 2022

Yes, I'd thought about that as a possible approach, but was worried about code bloat. It quickly became easy in my experiments to add 8kb+ to the binary size. I wasn't sure what kind of size increase would be acceptable.

Binary size isn't nearly as important as most people think. It's not like the CPU has to store the whole binary in the cache. It (pretty much) only caches the parts that are actually being called. Not to mention that caches are much bigger than they used to. Also, cache lines are tiny (on most processors it's 64 bytes, although there are exceptions like Apple M1 with 128 bytes), so it's pretty granular.

Of course, only benchmarks can tell us the truth. It's a shame we don't a general benchmark suite for perl. It would be wonderful to be able to just click a button and get some numbers. t/benchmark doesn't count because it's valgrind and it basically only counts instructions.

If that is a big enough concern, what do you think about a newSV_type_inline that does as we've discussed for hotter codepaths, but leaves the original function for colder code?

It isn't a bad idea, but I think we should first try inlining everything (maybe combined with the __builtin_constant_p check). I mean, most of the individual branches in perl_sv_upgrade_fresh are tiny.

Oh, more than that - additional definitions would also have to move so that new_SV() can be used outside of sv.c. Maybe an entirely new header file would be the tidiest solution, rather than moving so much into inline.h?

I don't have strong opinions on this topic. Personally, I'd probably put the structures and the macros in sv.h

@richardleach
Copy link
Contributor Author

Agreed on cache characteristics. I just had the feeling that a non-trivial increase in size might cause waves.

Wonder if we could suggest porting t/benchmark to perf as a Summer of Code project?

I'll hopefully have a few hours this weekend, I'll make a start on an inlined newSV_type branch.

@richardleach richardleach added do not merge Don't merge this PR, at least for now needs-work The pull request needs changes still labels Feb 7, 2022
@richardleach
Copy link
Contributor Author

I'll make a start on an inlined newSV_type branch.

A thrown-together PoC of this is at https://github.com/richardleach/perl5/tree/inline-newSV_type

Because of the dependencies between the existing .h files and the parts of sv.,c that have to move, I ended up creating sv_inline,.h.

The compiled perl is almost 33kb larger than blead and performs ~12% WORSE than blead on the simple split benchmark mentioned at the top of this PR. :(

I haven't had time to run with perf yet to try to see what's going on.

@richardleach
Copy link
Contributor Author

Hahah, user error was somehow going on. The inline newSV_type branch mentioned above is about as fast as the sv_upgrade_fresh branch in this PR. However, blead and the sv_upgrade_fresh _perl_s are both around the 3.7 MB mark, but the newSV_type perl is a ridiculous 11 MB!

I'll try to see if something daft is going on. If not, then that's surely too much of a size increase and a more selective approach will be needed?

@richardleach
Copy link
Contributor Author

The inlined newSV_type branch is still about the 3.7 MB mark when you don't forget you built it with "-Accflags=-g".

@xenu
Copy link
Member

xenu commented Mar 7, 2022

#19414 was merged instead.

@xenu xenu closed this Mar 7, 2022
@richardleach richardleach deleted the sv_upgrade_fresh1 branch March 13, 2022 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge Don't merge this PR, at least for now hasConflicts needs-work The pull request needs changes still

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants