Skip to content

Commit a734173

Browse files
authored
Add documentation for subroutine return values (#738)
1 parent 903bdeb commit a734173

File tree

7 files changed

+592
-278
lines changed

7 files changed

+592
-278
lines changed

doc/html/pcre2pattern.html

+68-16
Original file line numberDiff line numberDiff line change
@@ -41,14 +41,12 @@ <h1>pcre2pattern man page</h1>
4141
<li><a name="TOC26" href="#SEC26">CONDITIONAL GROUPS</a>
4242
<li><a name="TOC27" href="#SEC27">COMMENTS</a>
4343
<li><a name="TOC28" href="#SEC28">RECURSIVE PATTERNS</a>
44-
<li><a name="TOC29" href="#SEC29">GROUPS AS SUBROUTINES</a>
45-
<li><a name="TOC30" href="#SEC30">ONIGURUMA SUBROUTINE SYNTAX</a>
46-
<li><a name="TOC31" href="#SEC31">CALLOUTS</a>
47-
<li><a name="TOC32" href="#SEC32">BACKTRACKING CONTROL</a>
48-
<li><a name="TOC33" href="#SEC33">EBCDIC ENVIRONMENTS</a>
49-
<li><a name="TOC34" href="#SEC34">SEE ALSO</a>
50-
<li><a name="TOC35" href="#SEC35">AUTHOR</a>
51-
<li><a name="TOC36" href="#SEC36">REVISION</a>
44+
<li><a name="TOC29" href="#SEC29">CALLOUTS</a>
45+
<li><a name="TOC30" href="#SEC30">BACKTRACKING CONTROL</a>
46+
<li><a name="TOC31" href="#SEC31">EBCDIC ENVIRONMENTS</a>
47+
<li><a name="TOC32" href="#SEC32">SEE ALSO</a>
48+
<li><a name="TOC33" href="#SEC33">AUTHOR</a>
49+
<li><a name="TOC34" href="#SEC34">REVISION</a>
5250
</ul>
5351
<h2><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION DETAILS</a></h2>
5452
<p>
@@ -3399,7 +3397,9 @@ <h3>
33993397
"b" and so the whole match succeeds. This match used to fail in Perl, but in
34003398
later versions (I tried 5.024) it now works.
34013399
<a name="groupsassubroutines"></a></p>
3402-
<h2><a name="SEC29" href="#TOC1">GROUPS AS SUBROUTINES</a></h2>
3400+
<h3>
3401+
Groups as subroutines
3402+
</h3>
34033403
<p>
34043404
If the syntax for a recursive group call (either by number or by name) is used
34053405
outside the parentheses to which it refers, it operates a bit like a subroutine
@@ -3446,8 +3446,60 @@ <h2><a name="SEC29" href="#TOC1">GROUPS AS SUBROUTINES</a></h2>
34463446
in groups when called as subroutines is described in the section entitled
34473447
<a href="#btsub">"Backtracking verbs in subroutines"</a>
34483448
below.
3449+
</p>
3450+
<h3>
3451+
Recursion and subroutines with returned capture groups
3452+
</h3>
3453+
<p>
3454+
Since PCRE2 10.46, recursion and subroutine calls may also specify a list of
3455+
capture groups to return. This is a PCRE2 syntax extension not supported by
3456+
Perl. The pattern matching recurses into the referenced expression as described
3457+
above, however, when the recursion returns to the calling expression the
3458+
subgroups captured during the recursion can be retained when the calling
3459+
expression's context is restored.
3460+
</p>
3461+
<p>
3462+
When used as a subroutine, this allows the subroutine's capture groups to
3463+
be used as return values.
3464+
</p>
3465+
<p>
3466+
Only the specific capture groups listed by the caller will be retained, using
3467+
the following syntax:
3468+
<pre>
3469+
(?R(grouplist)) recurse whole pattern, returning capture groups
3470+
(?n(grouplist)) )
3471+
(?+n(grouplist)) )
3472+
(?-n(grouplist)) ) call subroutine, returning capture groups
3473+
(?&name(grouplist)) )
3474+
(?P&#62;name(grouplist)) )
3475+
</pre>
3476+
</p>
3477+
<p>
3478+
The list of capture groups "grouplist" is a comma-separated list of (absolute
3479+
or relative) group numbers, and group names enclosed in single quotes or angle
3480+
brackets.
3481+
</p>
3482+
<p>
3483+
Here is an example which first uses the DEFINE condition to create a re-usable
3484+
routine for matching a weekday, then calls that subroutine and retains the
3485+
groups it captures for use later:
3486+
<pre>
3487+
(?x: # ignore whitespace for clarity
3488+
# Define the routine "weekendday" which matches Saturday or
3489+
# Sunday, and returns the Sat/Sun prefix as \k&#60;short&#62;.
3490+
(?(DEFINE) (?&#60;weekendday&#62;
3491+
(?|(?&#60;short&#62;Sat)urday|(?&#60;short&#62;Sun)day) ) )
3492+
# Call the routine. Matches "Saturday,Sat" or "Sunday,Sun".
3493+
(?&weekendday(&#60;short&#62;)),\k&#60;short&#62; )
3494+
</pre>
3495+
</p>
3496+
<p>
3497+
This feature is not available using the Oniguruma syntax \g&#60;...&#62; or \g'...'
3498+
below.
34493499
<a name="onigurumasubroutines"></a></p>
3450-
<h2><a name="SEC30" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a></h2>
3500+
<h3>
3501+
Oniguruma subroutine syntax
3502+
</h3>
34513503
<p>
34523504
For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
34533505
a number enclosed either in angle brackets or single quotes, is an alternative
@@ -3465,7 +3517,7 @@ <h2><a name="SEC30" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a></h2>
34653517
Note that \g{...} (Perl syntax) and \g&#60;...&#62; (Oniguruma syntax) are <i>not</i>
34663518
synonymous. The former is a backreference; the latter is a subroutine call.
34673519
</p>
3468-
<h2><a name="SEC31" href="#TOC1">CALLOUTS</a></h2>
3520+
<h2><a name="SEC29" href="#TOC1">CALLOUTS</a></h2>
34693521
<p>
34703522
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
34713523
code to be obeyed in the middle of matching a regular expression. This makes it
@@ -3543,7 +3595,7 @@ <h3>
35433595
</pre>
35443596
The doubling is removed before the string is passed to the callout function.
35453597
<a name="backtrackcontrol"></a></p>
3546-
<h2><a name="SEC32" href="#TOC1">BACKTRACKING CONTROL</a></h2>
3598+
<h2><a name="SEC30" href="#TOC1">BACKTRACKING CONTROL</a></h2>
35473599
<p>
35483600
There are a number of special "Backtracking Control Verbs" (to use Perl's
35493601
terminology) that modify the behaviour of backtracking during matching. They
@@ -4071,7 +4123,7 @@ <h3>
40714123
is no such group within the subroutine's group, the subroutine match fails and
40724124
there is a backtrack at the outer level.
40734125
<a name="ebcdicenvironments"></a></p>
4074-
<h2><a name="SEC33" href="#TOC1">EBCDIC ENVIRONMENTS</a></h2>
4126+
<h2><a name="SEC31" href="#TOC1">EBCDIC ENVIRONMENTS</a></h2>
40754127
<p>
40764128
Differences in the way PCRE behaves when it is running in an EBCDIC environment
40774129
are covered in this section.
@@ -4115,12 +4167,12 @@ <h3>
41154167
points. However, if the range is specified numerically, for example,
41164168
[\x88-\x92] or [h-\x92], all code points are included.
41174169
</p>
4118-
<h2><a name="SEC34" href="#TOC1">SEE ALSO</a></h2>
4170+
<h2><a name="SEC32" href="#TOC1">SEE ALSO</a></h2>
41194171
<p>
41204172
<b>pcre2api</b>(3), <b>pcre2callout</b>(3), <b>pcre2matching</b>(3),
41214173
<b>pcre2syntax</b>(3), <b>pcre2</b>(3).
41224174
</p>
4123-
<h2><a name="SEC35" href="#TOC1">AUTHOR</a></h2>
4175+
<h2><a name="SEC33" href="#TOC1">AUTHOR</a></h2>
41244176
<p>
41254177
Philip Hazel
41264178
<br>
@@ -4129,7 +4181,7 @@ <h2><a name="SEC35" href="#TOC1">AUTHOR</a></h2>
41294181
Cambridge, England.
41304182
<br>
41314183
</p>
4132-
<h2><a name="SEC36" href="#TOC1">REVISION</a></h2>
4184+
<h2><a name="SEC34" href="#TOC1">REVISION</a></h2>
41334185
<p>
41344186
Last updated: 27 November 2024
41354187
<br>

doc/html/pcre2syntax.html

+24-2
Original file line numberDiff line numberDiff line change
@@ -566,14 +566,14 @@ <h2><a name="SEC25" href="#TOC1">SUBSTRING SCAN ASSERTION</a></h2>
566566
(*scan_substring:(grouplist)...) scan captured substring
567567
(*scs:(grouplist)...) scan captured substring
568568
</pre>
569-
The comma-separated list may identify groups in any of the following ways:
569+
The comma-separated list "grouplist" may identify groups in any of the
570+
following ways:
570571
<pre>
571572
n absolute reference
572573
+n relative reference
573574
-n relative reference
574575
&#60;name&#62; name
575576
'name' name
576-
577577
</pre>
578578
</p>
579579
<h2><a name="SEC26" href="#TOC1">SCRIPT RUNS</a></h2>
@@ -621,6 +621,28 @@ <h2><a name="SEC28" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><
621621
\g&#60;-n&#62; call subroutine by relative number (PCRE2 extension)
622622
\g'-n' call subroutine by relative number (PCRE2 extension)
623623
</pre>
624+
The variants using parentheses (?...) may also specify a list of capture groups
625+
to return, which shall be retained in the calling subexpression if set during
626+
the recursion (this feature is not supported by Perl).
627+
<pre>
628+
(?R(grouplist)) recurse whole pattern, returning capture groups
629+
(PCRE2 extension)
630+
(?n(grouplist)) )
631+
(?+n(grouplist)) ) call subroutine, returning capture groups
632+
(?-n(grouplist)) ) (PCRE2 extension)
633+
(?&name(grouplist)) )
634+
(?P&#62;name(grouplist)) )
635+
</pre>
636+
The comma-separated list "grouplist" uses the same syntax as
637+
(*scan_substring:(grouplist)...), and may identify groups in any of the
638+
following ways:
639+
<pre>
640+
n absolute reference
641+
+n relative reference
642+
-n relative reference
643+
&#60;name&#62; name
644+
'name' name
645+
</pre>
624646
</p>
625647
<h2><a name="SEC29" href="#TOC1">CONDITIONAL PATTERNS</a></h2>
626648
<p>

0 commit comments

Comments
 (0)