Skip to content

Various regex engine minor changes for @iabyn #20940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 18, 2023

Conversation

demerphq
Copy link
Collaborator

@demerphq demerphq commented Mar 18, 2023

IN #20918 (comment) @iabyn asked for some followups.

  • Remove vestigal change in Perl_rxres_save()
  • Fixes comment 'was bar-foo-bar prior to 5.37.7' to say '5.37.10' ?
  • Rework CAPTURE_CLEAR() and UNWIND_PAREN() macros to be static inline.
  • In regexp.h - document waht cp and lastcp are used for.
  • Document RE_PESSIMISTIC_PARENS and VOLATILE_REF better

Also includes two other changes I noticed when working on the above:

  • Change _pDEPTH and _aDEPTH macros to not use leading underscores. Changed it to comma_pDEPTH and comma_aDEPTH, also added debug_pDEPTH and debug_aDEPTH for completeness (if we ever add a function with no normal arguments that wants to do recursion depth tracking under DEBUGGING).
  • Throw an error if a pattern has more than U16_MAX open parens (aka capture groups in it). The code for some time has assumed that the id of all capture groups can fit in an U16, but we weren't validating that assumption during the regexp compile process. Long term we should revisit all of the logic related to this and either switch it to use U32, or change the things that use U32 members for parent count related logic to use U16 instead.

This was added as part of a different fix and was not properly
reverted when that fix was put on hold. This should only be
applied if we do adopt the patch which adds "start_new" and
"end_new" to the regexp_paren_pair structure.
These two defines are related to each other, and even though
VOLATILE_REF is not explicitly used in regexec.c which would require
it being placed in regcomp.h, it is implicitly, and RE_PESSIMISTIC_PARENS
*is* used in regexec.c. So put them both in regcomp.h and document them
together. This adds copious documentation for what they both are for.

RE_PESSIMISTIC_PARENS is effectively a "build option" (although intended
for debugging regex engine bugs only). VOLATILE_REF is the name of a
flag which is used to mark REF nodes as requiring special backtracking
support in regexec.c
and standardized member order, and line up comments and struct members
and other whitspace fixes. The internal tabs and messed up layout was
hurting my eyes.
We use U16 for various internal logic related to parens. If we
exceed this count stuff is going to go silently wrong. Might as
well throw a proper error during compilation to detect this.
…erbar

The leading underbar is reserved by C.

These defines are debugging only "recursion" depth related counters
injected into the function macro wrappers when a function is marked as
'W', much the same way that aTHX_ and pTHX_ are when building under
threaded builds. The functions are expected to incremented the depth
parameter themselves. Note that "recursion" is quoted above because in
practice currently they are only used by the regex engine when recursing
virtually, and they do not relate to true C stack related recursion.
(But they could be used for tracking C level recursion under debugging
if someone needed it.)
This makes it easier to debug, and type checks parameters and the
like. It does make it somewhat slower in debug mode, but so what.
This makes it easier to debug, and type checks parameters and the
like. It does make it somewhat slower in debug mode, but so what.
@iabyn
Copy link
Contributor

iabyn commented Mar 18, 2023

LGTM

@demerphq demerphq merged commit b6b3bfb into blead Mar 18, 2023
@demerphq demerphq deleted the yves/regex_engine_tweaks_for_iabyn branch March 18, 2023 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants