Skip to content

builtin.c cleanup+optimize+bloat #22659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: blead
Choose a base branch
from
Open

builtin.c cleanup+optimize+bloat #22659

wants to merge 1 commit into from

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Oct 10, 2024

builtin.c is very new (2021), and while the code works, it is not a shining good example of C or XS code and best practices

-actual bug
cv_set_call_checker_flags(cv, builtin->checker, newSVuv(PTR2UV(builtin)), 0); leaks refcnt

-remove and reorganize all the misaligned fields in
"struct BuiltinFuncDescriptor", on x64, it was 6 ptrs big. After this
patch, each element is 2 ptrs big. (2168)-(2128) = 672 bytes saved,
from interp binary, and it was almost all null bytes.
-To make a bitfield take bits from U8 sub_name_len not U16 ckval, which
are 0-~400 perl opcodes. less chance of, faster visibility, and less
debugging to do, if very long method names get added and broken, vs
breaking perl runtime engine impl details if suddenly the opcodes grow
in the future.

-S_warn_experimental_builtin() will likely never execute, factor the
"builtin->name" pointer read the cold separate static func vs at callers.

-S_export_lexical() batch ptr r/w, so CC doesn' have to again emit a r/w to
ithread interp var PL_curpad after the SvREFCNT_dec() fn call. Do ++
first, in case an almost impossible exception is thrown from
SvREFCNT_dec(). Paranoia, no real reason.

-also note inefficiency that pad_add_name_pvn() allocates a SV head, AND
upgrades it to SVt_PVCV, and right after S_export_lexical() frees that
SV *, and puts our own in. No Perl API exists to allocate a pad slot, with
an uninit garbage pad slot returned ready to fill.

-pad_add_name_sv->pad_add_name_pvn, pad_add_name_sv() has no perf
(HEK? COW?) benefits over pad_add_name_pvn(), and is only a thin wrapper.
Don't use SVs in non-SV APIs if a cheaper malloc()/cstring can be used.

-"SvUV(ckobj)"->"SvUVX(ckobj)", remove the unconditional "2uv()" fn call.
That fix locally the major problem in
#22653 . Bc we created this SV, nobody
knows about it, and just blindly deref with SvUVX skipping magic and
stringifying overhead. ck_builtin_*()s execute alot in mktables and
modern perl code, its not the BOOT: sub.

-'SV *prototype = newSVpvs(""); SAVEFREESV(prototype);' dont make this
temp SV over and over. Use MY_CXT to store our 3 high refcnt COW
strings. Using COW shared HEK instead other COW types, since HEK code is
much older and "stable", and the "new COW" API in sv.c has a very
little API, and I think, untested, would assert, panic, or generically
corrupt a CV * leading to a SEGV. sv_sethek() is "safer". Not using
flag SVppv_STATIC and C strings, since no API exists for SVppv_STATIC.
Maybe one day, SVppv_STATIC can be put in here, for now sv_sethek() is
simpler.

-Perl_die->Perl_die_nocontext saves CC CPU opcodes on ithread perl.

-S_import_sym() do not make multiple redundant cycle through SV code for
1 use "builtin::foo" and "&foo" strings. And S_import_sym() can be called
with PP ->import(); many times x 21 on script start up, b/c of many
.pm's wanting these subs.

-S_cv_is_builtin() we set CvFILE() on our newXS call, it would be bizzare
if the CV* does NOT have our CvFILE ro string ptrs anymore. Direct ptr
comparison skips the call to libc on for streq. Should be a panic if
CV_DYNFILE or alien CvFILE RO string ptr is found. But KISS and only do
ptr cmp for perf.

-pad_findmy_sv->pad_findmy_pvn, left is thin wrap for right, skip SV *
alloc and SVPV manip code and go very close to assembly with a C stack
buffer and maybe inlined memcpy()s.

-BOOT: newXS_flags()->newXS_len_flags(), remove strlen(), also, share the
"$" and "@" PVs between the CVs. There is a lack of perlapi, and other
design discussions, why newXS_len_flags() with RO const char * proto,
mallloc()s a copy, and remember Perl rounds up to 16 bytes for a PV ,
and "2 PTRs" secret malloc header, so, for prototype string "$", 2 bytes.
32 bytes * 21 XSUBS = 672 or more bytes were wasted using proto arg
of newXS_len_flags(). sv_sethek() fixes that. Also remember the SV
/HEK

is reused for ck_entersub_args_proto() later on.

savings are disk/file size (RO data) of libperl bin, and proc unconditional startup overhead in CPU time, perm malloc() reduction,

builtin.c is very new (2021), and while the code works, it is not a
shining good example of C or XS code and best practices

-actual bug
cv_set_call_checker_flags(cv, builtin->checker, newSVuv(PTR2UV(builtin)), 0);
leaks refcnt

-remove and reorganize all the misaligned fields in
 "struct BuiltinFuncDescriptor", on x64, it was 6 ptrs big. After this
 patch, each element is 2 ptrs big. (21*6*8)-(21*2*8) = 672 bytes saved,
 from interp binary, and it was almost all null bytes.
-To make a bitfield take bits from U8 sub_name_len not U16 ckval, which
 are 0-~400 perl opcodes. less chance of, faster visibility, and less
 debugging to do, if very long method names get added and broken, vs
 breaking perl runtime engine impl details if suddenly the opcodes grow
 in the future.

-S_warn_experimental_builtin() will likely never execute, factor the
 "builtin->name" pointer read the cold separate static func vs at callers.

-S_export_lexical() batch ptr r/w, so CC doesn' have to again emit a r/w to
 ithread interp var PL_curpad after the SvREFCNT_dec() fn call. Do ++
 first, in case an almost impossible exception is thrown from
 SvREFCNT_dec(). Paranoia, no real reason.

-also note inefficiency that pad_add_name_pvn() allocates a SV head, AND
 upgrades it to SVt_PVCV, and right after S_export_lexical() frees that
 SV *, and puts our own in. No Perl API exists to allocate a pad slot, with
 an uninit garbage pad slot returned ready to fill.

-pad_add_name_sv->pad_add_name_pvn, pad_add_name_sv() has no perf
 (HEK? COW?) benefits over pad_add_name_pvn(), and is only a thin wrapper.
 Don't use SVs in non-SV APIs if a cheaper malloc()/cstring can be used.

-"SvUV(ckobj)"->"SvUVX(ckobj)", remove the unconditional "2uv()" fn call.
 That fix locally the major problem in
 Perl#22653 . Bc we created this SV, nobody
 knows about it, and just blindly deref with SvUVX skipping magic and
 stringifying overhead. ck_builtin_*()s execute alot in mktables and
 modern perl code, its not the BOOT: sub.

-'SV *prototype = newSVpvs(""); SAVEFREESV(prototype);' dont make this
 temp SV over and over. Use MY_CXT to store our 3 high refcnt COW
 strings. Using COW shared HEK instead other COW types, since HEK code is
 much older and "stable", and the "new COW" API in sv.c has a very
 little API, and I think, untested, would assert, panic, or generically
 corrupt a CV * leading to a SEGV. sv_sethek() is "safer". Not using
 flag SVppv_STATIC and C strings, since no API exists for SVppv_STATIC.
 Maybe one day, SVppv_STATIC can be put in here, for now sv_sethek() is
 simpler.

-Perl_die->Perl_die_nocontext saves CC CPU opcodes on ithread perl.

-S_import_sym() do not make multiple redundant cycle through SV code for
 1 use "builtin::foo" and "&foo" strings. And S_import_sym() can be called
 with PP ->import(); many times x 21 on script start up, b/c of many
 .pm's wanting these subs.

-S_cv_is_builtin() we set CvFILE() on our newXS call, it would be bizzare
 if the CV* does NOT have our CvFILE ro string ptrs anymore. Direct ptr
 comparison skips the call to libc on for streq. Should be a panic if
 CV_DYNFILE or alien CvFILE RO string ptr is found. But KISS and only do
 ptr cmp for perf.

-pad_findmy_sv->pad_findmy_pvn, left is thin wrap for right, skip SV *
 alloc and SVPV manip code and go very close to assembly with a C stack
 buffer and maybe inlined memcpy()s.

-BOOT: newXS_flags()->newXS_len_flags(), remove strlen(), also, share the
 "$" and "@" PVs between the CV*s. There is a lack of perlapi, and other
 design discussions, why newXS_len_flags() with RO const char * proto,
 mallloc()s a copy, and remember Perl rounds up to 16 bytes for a PV *,
 and "2 PTRs" secret malloc header, so, for prototype string "$", 2 bytes.
 32 bytes * 21 XSUBS = 672 or more bytes were wasted using proto arg
 of newXS_len_flags(). sv_sethek() fixes that. Also remember the SV*/HEK*
 is reused for ck_entersub_args_proto() later on.

savings are disk/file size (RO data) of libperl bin, and proc
unconditional startup overhead in CPU time, perm malloc() reduction,
faster ->import() per package/per yy_parser
@jkeenan
Copy link
Contributor

jkeenan commented Oct 11, 2024

Two porting test failures need fixing; see tail of https://github.com/Perl/perl5/actions/runs/11281443516/job/31376685964?pr=22659.

@Leont
Copy link
Contributor

Leont commented Oct 11, 2024

This change would be easier to review if it was cut into smaller pieces, you're doing many different things here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants