Skip to content

Create CvSUBOVERRIDE to speed up common sub idioms #21824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: blead
Choose a base branch
from

Conversation

atrodo
Copy link
Contributor

@atrodo atrodo commented Jan 12, 2024

This new code path does two separate things designed to allow common function patterns to be replaced with a static C function at runtime:

First it adds two additional fields to CV and a check inside pp_entersub that, if set, will call a C function instead of setting up a perl stack frame and calling the original function. This call can return false to indicate it was unsuccessful and allow the original optree to operate as normal.

Second it adds an initial optimization that identifies a common accessor method pattern and attaches them with a static hash-key lookup function. The optimization is smart enough to handle cases when the function is also a mutator and passes control back to the optree to be executed.

This combination enables common OO-style accessor to return as quick as an XS version is able to.

A compiler flag, -Accflags=-DPERL_CV_OVERRIDE, must be given to enable the feature; both the check inside pp_entersub as well as initial optimization.


I have opened this as draft PR, I believe the concept in general is technically viable but I am unsure if it is a good fit. I can appreciate that as a new contributor this is touching some important bits, but I had an interest in learning about perl's internals and had this as a specific goal in mind. I had previously opened an earlier draft PR with a much different in-progress version; the feedback I received from that significantly changed the final approach.

I benchmarked the change with Porting/bench.pl, which my interpretation of the information indicates that on average the call:: benchmarks are about 2% slower overall, but the new call::sub::accessor benchmark is about 220% faster. I understand that making every call slower, even if by a small amount, is not going to work with every workload, and so I decided to put the optimization behind a compiler flag.

I did another benchmark by creating a script to compare several popular object systems. The results of those appear to show that this optimization allows accessors that use the optimized pattern to be within the range of speed of some popular XS accessor modules.

Both the script I created above as well as my output from Porting/bench.pl is available in the below gist. In both cases blead refers to a compiled version of perl without -Accflags=-DPERL_CV_OVERRIDE, and hacked refers to a compiled version of perl with that flag defined.
https://gist.github.com/atrodo/98f6cde36c8fa9954de34957be876493

This new code path does two separate things designed to allow common
function patterns to be replaced with a static C function at runtime:

First it adds two additional fields to CV and a check inside pp_entersub
that, if set, will call a C function instead of setting up and calling a
perl stack frame. This call can return false to indicate it was unsuccessful
and allow the original optree to operate as normal.

Second it adds an initial optimization that identifies a common accessor
method pattern and attaches them with a static hash-key lookup function. The
optimization is smart enough to handle cases when the function is also a
mutator and passes control back to the optree to be executed.

This combination enables common OO-style accessor to return as quick as an
XS version is able to.

A compiler flag, -Accflags=-DPERL_CV_OVERRIDE, must be given to enable the
feature; both the check inside pp_entersub as well as initial optimization.
@iabyn
Copy link
Contributor

iabyn commented Jan 12, 2024

Two general comments. First, AFAIKT, nowhere do you state what actual "common accessor method pattern"(s) it overrides. Second, You have a couple of non-trivial C functions with no code comments at the top to say what they do.

Then a specific observation: I doubt that this code will work correctly under PERL_RC_STACK. As a general rule, no new code in core should include dSP and all the macros associated with it. See the new section "Reference-counted argument stack" in perlguts.pod.

I haven't understood the PR well enough yet (see the the two general comments above) to form an opinion on it.

Do you have any firm ideas/plans for further common method patterns to override?

@leonerd
Copy link
Contributor

leonerd commented Jan 12, 2024

Plus my overall thinking is that this is quite a large body of work to surprise on folks from a new contributor. I would have much preferred to have some initial discussion on the aims and directions first, so we can give some more specific advice on how best to go about doing it.

@iabyn
Copy link
Contributor

iabyn commented Jan 12, 2024

Some further observations:

  1. Since part of this commit is to provide a new facility to effectively add a "lightweight XS" function to a CV that is allowed to bail out early and pass control back to the "slow" code path if its a set-up it can't handle, then this facility needs to be properly documented. In particular, it needs to make clear what normal XS functionality is being skipped and needs to be handled by the function itself if necessary (such as not replacing PADTMP arguments with copies). It also needs to make clear that if the function returns false, the stack etc should be left in its original state, so that the normal perl op code path can still be performed.
  2. Note that S_defgv_hek_accessor() appears to violate that - the (!isGV_with_GP) test returns false, but by then the stack has already been popped.
  3. The SV stored in CvSUBOVERRIDEAUX(cv) doesn't appear to be freed when the CV is freed.

@atrodo
Copy link
Contributor Author

atrodo commented Jan 12, 2024

First, AFAIKT, nowhere do you state what actual "common accessor method pattern"(s) it overrides.

I apologize, you are correct, I did not state it explicitly. It is hidden away in the S_cv_override_create function, and I should have brought the examples to the forefront. The common patterns this is able to handle are:

sub foo { return $_[0]->{foo} }
sub foo { @_ > 1 && ...; $_[0]->{foo} }
sub foo { @_ != 1 && ...; $_[0]->{foo} }
sub foo { return $_[0]->{foo} if @_ == 1 }

Second, You have a couple of non-trivial C functions with no code comments at the top to say what they do.

I will plan to add those, it was simply an oversight on my part. The quick summary of the two functions are:

S_defgv_hek_accessor: This is the function that runs at runtime attempting to optimize an accessor pattern. It will check that the number of arguments to a function call is exactly 1, and confirms that the single item is a hashref. If so, it returns the hash value of a static key stored in CvSUBOVERRIDEAUX.

S_cv_override_create: This is the optimizer step that looks at the function as a whole and determines if an optimzation can be made. If so, it enables CvIsSUBOVERRIDE, sets the function to call to S_defgv_hek_accessor, and saves the constant key.

Then a specific observation: I doubt that this code will work correctly under PERL_RC_STACK. As a general rule, no new code in core should include dSP and all the macros associated with it. See the new section "Reference-counted argument stack" in perlguts.pod.

That's very possible. Some time ago I did attempt to compile with PERL_RC_STACK and PERL_CV_OVERRIDE, and my memory is that nothing immediately failed. I will make sure to check again and work on adjusting it to take care of the related macros before taking the draft status off this PR.

Do you have any firm ideas/plans for further common method patterns to override?

My goal initially was to support how OO framework generate their accessors, and the patterns above covers some of the more common frameworks, Moo and Moose specifically.

I have thought about what else it could be used for, and I'm not going to say all of those ideas are valuable, just that they are simple patterns that I've used that maybe don't require a full stack to be created in order to function in all cases.

  • Accessors that access a pad value instead of a constant { my $k = 'foo'; sub foo { return $_[0]->{$k} } }
  • Accessors that do a little more checking, i.e. sub foo { die "..." if ref $_[0] ne "..."; return $_[0]->{foo}; }
  • Simple mutators of the form sub foo { exists $_[1] && $_[0]->{foo} = $_[1] }
  • Functions guarded by a global or local variable, i.e. sub info { return unless $FOO_INFO; ... }
  • Simple comparison sort functions, i.e. sub sort { $a cmp $b }
  • Simple check functions, like sub iscoderef { defined $_[0] && ref $_[0] eq 'CODE' }
  • Empty functions being called on an object i.e. sub foo {}
  • Simple array/hash creation functions, i.e. sub foo { return { foo => { @_ } } }

The strength of the way it works currently is that the static function doesn't have to handle the entire body of the function; it can see that the conditional it is optimizing for is not true and allow the optree to take over. So in the case of return unless $FOO, it can do the global lookup to see if the stack and optree need to run at all.

This PR is the result of an exercise in digging deeper into the perl internals, something I feel has been a good exercise for me, personally. I opened this PR because I had this code complete and seems from a surface level to be an improvement for at least some workloads. I opened it as a draft PR specifically to check to see if the changes were something others would be interested in. If this is the type of optimization that doesn't fit in the project, I am happy to close the PR and move on to something else. If the idea has legs, I'm happy to do any changes in comments, style or methodology to bring it up to par with expectations.

@atrodo
Copy link
Contributor Author

atrodo commented Jan 12, 2024

Some further observations:

1. Since part of this commit is to provide a new facility to effectively add a "lightweight XS" function to a CV that is allowed to bail out early and pass control back to the "slow" code path if its a set-up it can't handle, then this facility needs to be properly documented. In particular, it needs to make clear what normal XS functionality is being skipped and needs to be handled by the function itself if necessary (such  as not replacing PADTMP arguments with copies). It also needs to make clear that if the function returns false, the stack etc should be left in its original state, so that the normal perl op code path can still be performed.

That is a good description of what it's doing. I plan to add documentation about it.

2. Note that S_defgv_hek_accessor() appears to violate that - the (!isGV_with_GP) test returns false, but by then the stack has already been popped.

That section is copied from do_HV_rv2hv_helem, so I was unaware of those side effects. I will dig into what its actually doing and adjust that conditional.

3. The SV stored in CvSUBOVERRIDEAUX(cv) doesn't appear to be freed when the CV is freed.

Had not thought about a CV getting freed. I will see what I can do.

@hvds
Copy link
Contributor

hvds commented Jan 12, 2024

sub foo { return $_[0]->{foo} }

A similar one that I have often seen is { shift->{foo} }, and in all cases with or without an explicit return.

@iabyn
Copy link
Contributor

iabyn commented Jan 13, 2024 via email

@jkeenan
Copy link
Contributor

jkeenan commented Aug 26, 2024

@atrodo, this pull request has acquired merge conflicts. Could you resolve them and re-push? Thanks.

@bulk88
Copy link
Contributor

bulk88 commented Jan 16, 2025

CV body struct needs a brainstorming. CVFILE still leaks like a thunderstorm with CvDYNFILE(), "$\0" prototypes in a CV's PVX/CUR/LEN blasts away 32 bytes of memory. I'd rather not add 2 new ptr fields to CV body without everyone thinking it through.

I32 xcv_depth /* >= 2 indicates recursive call */
I32 xcv_depth; /* >= 2 indicates recursive call */ \
bool (*xcv_suboverride)(pTHX_ CV *); \
SV * xcv_suboverride_aux
Copy link
Contributor

@bulk88 bulk88 Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds 16 bytes to all SV body structs, does anyone have ideas how that can be prevented?

HV* xmg_stash (1st field of all SV body structs, regardless if a ghost field or alloced field), is 99.999% unused for CV*s.
CUR/LEN/SvPVX() is 10% used, and are part of the minimally used PP prototypes system and also used by XSUB AUTOLOAD layer.

HV* xcv_stash and GV*/HEK* xcv_gv_u are weird and I don't understand why they can't be a union. HV* xcv_stash and HEK* xcv_gv_u (16 bytes) only exist for my $sub and default off RV2CV no GV optimization, to regenerate the GV* if needed from just a CV* ptr. There needs to be 2 ptrs to something for RV2CV to work, but why blessed HV*, CV specific HV*, and (GV* || HEK*) at all times? That is 3 ptrs of data.

CV body field HV* xcv_stash and CV body field HV* xcv_gv_u.xcv_gv->sv_any->xnv_u.xgv_stash are always identical in the Perl VM AFAIK.

CV* CvOUTSIDE() is a PP sub only concept, XSUBs can't be nested, static/hidden private scope/layered or closured against other XSUBs or C functions or against PP CV* PP lang level scopes. Maybe that CvOUTSIDE can be recycled for something else. Maybe field CvOUTSIDE() which is 8 bytes/void * can be replaced by a U16 offset into a PP sub's PADLIST SV* array. U32 cv_flags_t is in great danger of running out of bits. highest flag right now is 0x40 00 00, Im counting only 9 unused bits left in that U32.

An XSUB CV* currently has ALOT of unused void *s. 1 goes to CPAN XS authors as XS_ANY. but there are atleast 2 more void s that are unused. cv_outside is always NULL. and cv_padlist I repurposed for no-threads perl XS handshake ABI compatibility metadata since the CV pointer is the 1 and only C language argument passed to a boot XSUB, and handshake metadata has no choice but to be delivered through the CV* cv argument because there is no my_perl variable on a no thd perl. That handshake metadata void * doesn't need to exist, except for boot XSUBs, and only for a very short temporary time, only when that boot XSUB is 1st time ever executed using DynaLoader or XSLoader into a brand new in address space .so/.dll. 100% proven on all OSes and all build Configure permutations, static linked, boot XSUBs dont need a XS handshake test since it was proved at libperl.so CC time or link time that, that boot XSUB can't be separated by the end user from libperl.so.

@bulk88
Copy link
Contributor

bulk88 commented Apr 5, 2025

Some further observations:

  1. Since part of this commit is to provide a new facility to effectively add a "lightweight XS" function to a CV that is allowed to bail out early and pass control back to the "slow" code path if its a set-up it can't handle, then this facility needs to be properly documented. In particular, it needs to make clear what normal XS functionality is being skipped and needs to be handled by the function itself if necessary (such as not replacing PADTMP arguments with copies). It also needs to make clear that if the function returns false, the stack etc should be left in its original state, so that the normal perl op code path can still be performed.

The optimizations in this PR, should be part of a "nextgen" XSUB C level ABI of some sort that many people have wanted for 20+ years. The mandatory prolog in C/machine code, of all XSUBs mark = mypl->markptr; mark--; mypl->markptr = mark; sp = mypl->sp; vararg_count = sp-mark; is kindda crazy.

Why so much mandatory work in the 100K-1 million unique XSUBs, that are currently being executed/or sit inside physical ram this wall clock secon, on planet earth ? Why isn't incoming arg count, a size_t or U32 passed on C stack (nowadays always a CPU register except for i386) like all other C/C++ vararg code on earth?

A cleaner/simplified "BEGIN{} time burn my XSUB into the caller as a keyword/OP struct" public API, vs the current cargo culted CPAN XS copy pastes, would be nice.

@bulk88
Copy link
Contributor

bulk88 commented Apr 5, 2025

sub foo { return $_[0]->{foo} }

A similar one that I have often seen is { shift->{foo} }, and in all cases with or without an explicit return.

Can't this PR just be a very well written and very well C lang level optimized CPAN XS module instead if thats all this PR is supposed to do?

But counterpoint, other than its own maintainer/author and max a dozen other humans, nobody else will know of or use that CPAN XS module, use less::memory; and use less qw( memory ); are Perl jokes. but if someone actually published that tar.gz, it will still get no usage. This optimization has to be centrally deployed by P5P, so end users get it without manually refactoring all their code and demanding upstream and downstream CPAN module authors rerelease their modules.

@bulk88
Copy link
Contributor

bulk88 commented Apr 5, 2025

  • Accessors that access a pad value instead of a constant { my $k = 'foo'; sub foo { return $_[0]->{$k} } }

Someone who knows how to change the OP structs and OP tree needs to implement that, you dont need an XSUB, that all looks const foldable to my eye.

  • Accessors that do a little more checking, i.e. sub foo { die "..." if ref $_[0] ne "..."; return $_[0]->{foo}; }

Watch a video on youtube how to code in C. This is too many turning machine or per machine things to do in a row or pattern match for automatically without developer intervention, in my opinion

  • Simple mutators of the form sub foo { exists $_[1] && $_[0]->{foo} = $_[1] }

IDK, maybe lint or perlcritic should be used instead to identify pointless PP lang derefs. How many times SV GMG executes is another question if you wanna XS-ify that PP src code. Someone might complain, maybe nobody complains.

  • Functions guarded by a global or local variable, i.e. sub info { return unless $FOO_INFO; ... }

This is better done on a optree level than a hybrid of an XSUB and then a PP SUB executing. A sub or P5 "call frame" should be one or the other on a whiteboard, not both.

  • Simple comparison sort functions, i.e. sub sort { $a cmp $b }

This is already done, or non-merged PRs or P5P RT patches exist to do this. Someone asked I worked this particular optimization and I submitted a RT patch finishing the optimization, but it stalled out because I didn't know how to proceed b/c core tests failed, because the exactly working of fatal runtime errors slightly changed, and there were no responses if we (perl ecosystem) cares on STDERR byte equality for fatal developer errors.

  • Simple check functions, like sub iscoderef { defined $_[0] && ref $_[0] eq 'CODE' }

blead perl's Perl_pp_ref() and the Perl C API child C functions it calls, are a case point example of very bad C code right now. Multiple Strlens, no COW SVPVs, no shared HEK * s, all kinds of problems. pp_ref() should do the eq logic and the pp_push_constsv logic internally if the next 2 tokens are eq and pushconstsv((SVPV*)"some dbl quote");

If pp_ref() is doing what I described above, its performance would be 2-5 CPU opcodes and nothing more than HEK* == addr comparison instead of QEMU/Bochs style ultra heavy weight and slow current impl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants