@@ -41,14 +41,12 @@ <h1>pcre2pattern man page</h1>
41
41
< li > < a name ="TOC26 " href ="#SEC26 "> CONDITIONAL GROUPS</ a >
42
42
< li > < a name ="TOC27 " href ="#SEC27 "> COMMENTS</ a >
43
43
< li > < a name ="TOC28 " href ="#SEC28 "> RECURSIVE PATTERNS</ a >
44
- < li > < a name ="TOC29 " href ="#SEC29 "> GROUPS AS SUBROUTINES</ a >
45
- < li > < a name ="TOC30 " href ="#SEC30 "> ONIGURUMA SUBROUTINE SYNTAX</ a >
46
- < li > < a name ="TOC31 " href ="#SEC31 "> CALLOUTS</ a >
47
- < li > < a name ="TOC32 " href ="#SEC32 "> BACKTRACKING CONTROL</ a >
48
- < li > < a name ="TOC33 " href ="#SEC33 "> EBCDIC ENVIRONMENTS</ a >
49
- < li > < a name ="TOC34 " href ="#SEC34 "> SEE ALSO</ a >
50
- < li > < a name ="TOC35 " href ="#SEC35 "> AUTHOR</ a >
51
- < li > < a name ="TOC36 " href ="#SEC36 "> REVISION</ a >
44
+ < li > < a name ="TOC29 " href ="#SEC29 "> CALLOUTS</ a >
45
+ < li > < a name ="TOC30 " href ="#SEC30 "> BACKTRACKING CONTROL</ a >
46
+ < li > < a name ="TOC31 " href ="#SEC31 "> EBCDIC ENVIRONMENTS</ a >
47
+ < li > < a name ="TOC32 " href ="#SEC32 "> SEE ALSO</ a >
48
+ < li > < a name ="TOC33 " href ="#SEC33 "> AUTHOR</ a >
49
+ < li > < a name ="TOC34 " href ="#SEC34 "> REVISION</ a >
52
50
</ ul >
53
51
< h2 > < a name ="SEC1 " href ="#TOC1 "> PCRE2 REGULAR EXPRESSION DETAILS</ a > </ h2 >
54
52
< p >
@@ -3399,7 +3397,9 @@ <h3>
3399
3397
"b" and so the whole match succeeds. This match used to fail in Perl, but in
3400
3398
later versions (I tried 5.024) it now works.
3401
3399
< a name ="groupsassubroutines "> </ a > </ p >
3402
- < h2 > < a name ="SEC29 " href ="#TOC1 "> GROUPS AS SUBROUTINES</ a > </ h2 >
3400
+ < h3 >
3401
+ Groups as subroutines
3402
+ </ h3 >
3403
3403
< p >
3404
3404
If the syntax for a recursive group call (either by number or by name) is used
3405
3405
outside the parentheses to which it refers, it operates a bit like a subroutine
@@ -3446,8 +3446,60 @@ <h2><a name="SEC29" href="#TOC1">GROUPS AS SUBROUTINES</a></h2>
3446
3446
in groups when called as subroutines is described in the section entitled
3447
3447
< a href ="#btsub "> "Backtracking verbs in subroutines"</ a >
3448
3448
below.
3449
+ </ p >
3450
+ < h3 >
3451
+ Recursion and subroutines with returned capture groups
3452
+ </ h3 >
3453
+ < p >
3454
+ Since PCRE2 10.46, recursion and subroutine calls may also specify a list of
3455
+ capture groups to return. This is a PCRE2 syntax extension not supported by
3456
+ Perl. The pattern matching recurses into the referenced expression as described
3457
+ above, however, when the recursion returns to the calling expression the
3458
+ subgroups captured during the recursion can be retained when the calling
3459
+ expression's context is restored.
3460
+ </ p >
3461
+ < p >
3462
+ When used as a subroutine, this allows the subroutine's capture groups to
3463
+ be used as return values.
3464
+ </ p >
3465
+ < p >
3466
+ Only the specific capture groups listed by the caller will be retained, using
3467
+ the following syntax:
3468
+ < pre >
3469
+ (?R(grouplist)) recurse whole pattern, returning capture groups
3470
+ (?n(grouplist)) )
3471
+ (?+n(grouplist)) )
3472
+ (?-n(grouplist)) ) call subroutine, returning capture groups
3473
+ (?&name(grouplist)) )
3474
+ (?P>name(grouplist)) )
3475
+ </ pre >
3476
+ </ p >
3477
+ < p >
3478
+ The list of capture groups "grouplist" is a comma-separated list of (absolute
3479
+ or relative) group numbers, and group names enclosed in single quotes or angle
3480
+ brackets.
3481
+ </ p >
3482
+ < p >
3483
+ Here is an example which first uses the DEFINE condition to create a re-usable
3484
+ routine for matching a weekday, then calls that subroutine and retains the
3485
+ groups it captures for use later:
3486
+ < pre >
3487
+ (?x: # ignore whitespace for clarity
3488
+ # Define the routine "weekendday" which matches Saturday or
3489
+ # Sunday, and returns the Sat/Sun prefix as \k<short>.
3490
+ (?(DEFINE) (?<weekendday>
3491
+ (?|(?<short>Sat)urday|(?<short>Sun)day) ) )
3492
+ # Call the routine. Matches "Saturday,Sat" or "Sunday,Sun".
3493
+ (?&weekendday(<short>)),\k<short> )
3494
+ </ pre >
3495
+ </ p >
3496
+ < p >
3497
+ This feature is not available using the Oniguruma syntax \g<...> or \g'...'
3498
+ below.
3449
3499
< a name ="onigurumasubroutines "> </ a > </ p >
3450
- < h2 > < a name ="SEC30 " href ="#TOC1 "> ONIGURUMA SUBROUTINE SYNTAX</ a > </ h2 >
3500
+ < h3 >
3501
+ Oniguruma subroutine syntax
3502
+ </ h3 >
3451
3503
< p >
3452
3504
For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or
3453
3505
a number enclosed either in angle brackets or single quotes, is an alternative
@@ -3465,7 +3517,7 @@ <h2><a name="SEC30" href="#TOC1">ONIGURUMA SUBROUTINE SYNTAX</a></h2>
3465
3517
Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are < i > not</ i >
3466
3518
synonymous. The former is a backreference; the latter is a subroutine call.
3467
3519
</ p >
3468
- < h2 > < a name ="SEC31 " href ="#TOC1 "> CALLOUTS</ a > </ h2 >
3520
+ < h2 > < a name ="SEC29 " href ="#TOC1 "> CALLOUTS</ a > </ h2 >
3469
3521
< p >
3470
3522
Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl
3471
3523
code to be obeyed in the middle of matching a regular expression. This makes it
@@ -3543,7 +3595,7 @@ <h3>
3543
3595
</ pre >
3544
3596
The doubling is removed before the string is passed to the callout function.
3545
3597
< a name ="backtrackcontrol "> </ a > </ p >
3546
- < h2 > < a name ="SEC32 " href ="#TOC1 "> BACKTRACKING CONTROL</ a > </ h2 >
3598
+ < h2 > < a name ="SEC30 " href ="#TOC1 "> BACKTRACKING CONTROL</ a > </ h2 >
3547
3599
< p >
3548
3600
There are a number of special "Backtracking Control Verbs" (to use Perl's
3549
3601
terminology) that modify the behaviour of backtracking during matching. They
@@ -4071,7 +4123,7 @@ <h3>
4071
4123
is no such group within the subroutine's group, the subroutine match fails and
4072
4124
there is a backtrack at the outer level.
4073
4125
< a name ="ebcdicenvironments "> </ a > </ p >
4074
- < h2 > < a name ="SEC33 " href ="#TOC1 "> EBCDIC ENVIRONMENTS</ a > </ h2 >
4126
+ < h2 > < a name ="SEC31 " href ="#TOC1 "> EBCDIC ENVIRONMENTS</ a > </ h2 >
4075
4127
< p >
4076
4128
Differences in the way PCRE behaves when it is running in an EBCDIC environment
4077
4129
are covered in this section.
@@ -4115,12 +4167,12 @@ <h3>
4115
4167
points. However, if the range is specified numerically, for example,
4116
4168
[\x88-\x92] or [h-\x92], all code points are included.
4117
4169
</ p >
4118
- < h2 > < a name ="SEC34 " href ="#TOC1 "> SEE ALSO</ a > </ h2 >
4170
+ < h2 > < a name ="SEC32 " href ="#TOC1 "> SEE ALSO</ a > </ h2 >
4119
4171
< p >
4120
4172
< b > pcre2api</ b > (3), < b > pcre2callout</ b > (3), < b > pcre2matching</ b > (3),
4121
4173
< b > pcre2syntax</ b > (3), < b > pcre2</ b > (3).
4122
4174
</ p >
4123
- < h2 > < a name ="SEC35 " href ="#TOC1 "> AUTHOR</ a > </ h2 >
4175
+ < h2 > < a name ="SEC33 " href ="#TOC1 "> AUTHOR</ a > </ h2 >
4124
4176
< p >
4125
4177
Philip Hazel
4126
4178
< br >
@@ -4129,7 +4181,7 @@ <h2><a name="SEC35" href="#TOC1">AUTHOR</a></h2>
4129
4181
Cambridge, England.
4130
4182
< br >
4131
4183
</ p >
4132
- < h2 > < a name ="SEC36 " href ="#TOC1 "> REVISION</ a > </ h2 >
4184
+ < h2 > < a name ="SEC34 " href ="#TOC1 "> REVISION</ a > </ h2 >
4133
4185
< p >
4134
4186
Last updated: 27 November 2024
4135
4187
< br >
0 commit comments