PCRE Unicode reference contains incorrect statement

https://www.php.net/manual/en/regexp.reference.unicode.php states:

> That is why the traditional escape sequences such as \d and \w do not use Unicode properties in PCRE.

This is false, because of this commit:

https://github.com/php/php-src/commit/87a237342282fe036bb90486fdd6cdc392e16ac7

As per https://www.pcre.org/original/doc/html/pcrepattern.html#genericchartypes:

> By default, characters whose code points are greater than 127 never match \d, \s, or \w, and always match \D, \S, and \W, although this may vary for characters in the range 128-255 when locale-specific matching is happening. These escape sequences retain their original meanings from before Unicode support was available, mainly for efficiency reasons. If PCRE is compiled with Unicode property support, and the PCRE_UCP option is set, the behaviour is changed so that Unicode properties are used to determine character types, as follows:
> 
>   \d  any character that matches \p{Nd} (decimal digit)
>   \s  any character that matches \p{Z} or \h or \v
>   \w  any character that matches \p{L} or \p{N}, plus underscore
> The upper case escapes match the inverse sets of characters. Note that \d matches only decimal digits, whereas \w matches any Unicode digit, as well as any Unicode letter, and underscore. Note also that PCRE_UCP affects \b, and \B because they are defined in terms of \w and \W. Matching these sequences is noticeably slower when PCRE_UCP is set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCRE Unicode reference contains incorrect statement #2831

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PCRE Unicode reference contains incorrect statement #2831

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions