-
Notifications
You must be signed in to change notification settings - Fork 579
(Optimization:Regexec) Replaced recppush/regcppop with memcpy #19814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@khwilliamson Does that mean the tests failing are for author only - I thought there were alignment issues in 64 bit. That's why I closed this by myself. Anyway I'll be happy if this actually gets accepted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the general idea here, but more care is needed in the calculations - I think you need to retain the original calculation of paren_elems_to_push
, and convert that to units of PL_savestack
elements in a separate variable.
I think it would also be useful to reorder things slightly for clarity: in |
Can I get tests run please - I think nothing brakes with this (and I hope the debug prints are alright although I believe there aren't tests for that)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you've addressed the latest comments, it would probably be a good time to squash the commits and rebase.
regexec.c
Outdated
/* static assert that offs struc size is not less than stack elem size */ | ||
|
||
sizeof(char[sizeof(*rex->offs) >= sizeof(*PL_savestack) ? 1 : -1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See STATIC_ASSERT_DECL
and STATIC_ASSERT_STMT
in perl.h
This should ideally be used at the point we rely on it for the calculation in regcppop
, something like:
/* calculate number of offs/capture groups stored */
+ STATIC_ASSERT_STMT(sizeof(*rex->offs) >= sizeof(*PL_savestack));
i = (i * sizeof(*PL_savestack)) / sizeof(*rex->offs);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to move to reccppop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I marked this unresolved last time, but it still needs to move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a static assert I can move it if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvds Ok moved it as requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, can it be made into a compile time assert somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, can it be made into a compile time assert somehow?
Yes, per my comment above: "See STATIC_ASSERT_DECL and STATIC_ASSERT_STMT in perl.h".
d02d745
to
bbb74e9
Compare
bbb74e9
to
3ce61ce
Compare
f995eb3
to
1250ae4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remain nervous about the reliance on relative structure sizes, but I can live with this if someone else agrees that they're happy with it.
To recap: in regcppush
this change packs the capture groups to save (sizeof(*rex->offs)
each, currently equal to 3 * sizeof(SSize_t)
) into a calculated number of savestack slots (sizeof(*PL_savestack)
each, currently sizeof(ANY)
). In regcppop
we then discover how many capture groups should be restored by doing the reverse calculation - which can get the wrong result if there was enough spare space in the final savestack slot to fit another one. I.e. it can go wrong if we ever have sizeof(*PL_savestack) > sizeof(*rex->offs)
.
That risk is now managed by a static assert:
STATIC_ASSERT_STMT(sizeof(*rex->offs) >= sizeof(*PL_savestack));
My remaining concern is that if the assert ever fails on some platform, we have no good way to fix it for them. @tonycoz, @iabyn, @leonerd, anyone want to weigh in for or against?
I don't see why the static assert is needed, if we assume for argument that Size_t is 1 byte (making sizeof(regexp_paren_pair) == 3) and sizeof(*PL_savestack) is 8, then the calculation of I don't think the assertion can fail currently, though that would change if we added NV to |
@tonycoz The issue is with the reverse calculation in
With your example sizes, this would calculate 2 as the number of paren groups to restore, regardless of whether 1 or 2 paren groups had been saved in |
If I need to do anything - please tell me - I would like this to be pushed forward personally. |
Sorry, you were so obviously correct I didn't think I needed to respond. My comment about whether the assertion can fail still applies - only if we add NV or another similarly large type to |
On Thu, Jul 28, 2022 at 09:41:30AM -0700, Hugo van der Sanden wrote:
I'm hoping that at some point @tonycoz will get a chance to look at my
most recent response. If @iabyn gets a chance to look at the PR as well,
that would be super.
It's still on my todo list.
…--
Diplomacy is telling someone to go to hell in such a way that they'll
look forward to the trip
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least this could use some more detailed explanation in the comments why it is correct. I had to stare at a couple of parts to understand what it is doing.
Basically tested with Intel VTune to be increasing the performance of a perl regex matching program with multiple capture groups and recursive patterns
1250ae4
to
2336097
Compare
It looks reasonable to me at this point. I did a performance test using the code from the reproduction of #20411 (with a smaller source file). It made a huge difference in the number of instructions, from 9.8e9 to 2.6e9 instructions, from 5e6 branch misses to 3.6e6 branch misses. raw perf output
|
@khwilliamson @tonycoz @demerphq - what do we need to get this merged finally? |
I see we have ack already, and @tonycoz seems happy, ill do a further review and assuming im happy with it ill merge it. |
This was merged as 66964e1 I made two whitespace changes to the commit, one to the commit text and one to remove a spurious whitespace change, and since there had been so much chatter on this before I decided not to ask @AnFunctionArray to do it himself. I then manually pushed to blead. Closing this as merged. |
BTW, thanks for the patch! |
Well no problem - I've definitely had fun doing this after my crazy shenanigans - trying to add a way to preserve capture groups after recursion of a pattern (by adding new syntax) - but this and added multithreading plus some micro optimisations (like removing |
Improved performance of complex recursive regular expressions with a lot of capture groups.
Determined bottleneck with Intel VTune and fixed by using faster built-in memcpy instead of a loop.
Currently in my test cases performance seems to be 2x.