This is a module for PostgreSQL that exposes Perl-compatible regular expressions (PCRE) functionality as functions and operators. It is based on the popular PCRE library.
You need to have libpcre installed. pkg-config will be used to find it.
To build and install this module:
make
make install
or selecting a specific PostgreSQL installation:
make PG_CONFIG=/some/where/bin/pg_config
make PG_CONFIG=/some/where/bin/pg_config install
And finally inside the database:
CREATE EXTENSION pgpcre;
A regular expression is a separate data type, named pcre
. (This is different from how the built-in regular expressions in PostgreSQL work, which are simply values of type text
.)
The supported regular expressions are documented on the pcrepattern(3) man page.
Boolean operators are available for checking whether a pattern matches a string. These operators return true or false, respectively. They only return null when one of the operands is null.
Examples:
SELECT 'foo' ~ pcre 'fo+';
SELECT 'bar' !~ pcre 'fo+';
You can also write it the other way around:
SELECT pcre 'fo+' ~ 'foo';
SELECT pcre 'fo+' !~ 'bar';
This can be handy for writing things like
SELECT pcre 'fo+' ~ ANY(ARRAY['foo', 'bar']);
For Perl nostalgia, you can also use this operator:
SELECT 'foo' =~ pcre 'fo+';
And if this operator is unique (which it should be, unless you have something else installed that uses it), you can also write:
SELECT 'foo' =~ 'fo+';
(The ~
operator, by contrast, is not unique, of course, because it is used by the built-in regular expressions.)
To get case-insensitive matching, set the appropriate option in the pattern, for example:
SELECT 'FOO' ~ pcre '(?i)fo+';
To extract the substring that was matched by the pattern, use the
function pcre_match
. It returns either a value of type text, or
null if the pattern did not match. Examples:
SELECT pcre_match('fo+', 'foobar'); --> 'foo'
SELECT pcre_match('fo+', 'barbar'); --> NULL
There is no support for extracting multiple matches of a pattern in a string, because PCRE does not (easily) support that.
Captured substrings (parenthesized subexpressions) are extracted using
the function pcre_captured_substrings
. It returns either an array
of text, or null if the pattern did not match. Examples:
SELECT pcre_captured_substrings('(fo+)(b..)', 'foobar'); --> ARRAY['foo','bar']
SELECT pcre_captured_substrings('(fo+)(b..)', 'abcdef'); --> NULL
Note that elements of the array can be null if a substring was not used, for example:
SELECT pcre_captured_substrings('(a|(z))(bc)', 'abc'); --> ARRAY['a',NULL,'bc']
You can store regular expression values of type pcre
in tables, like
any other data. Note, however, that the binary representation of the
pcre
values contains the compiled regular expression, which is tied
to the version of the PCRE library. If you upgrade the PCRE library
and use a compiled value created by a different version, things might
not work or even crash (according to the PCRE documentation; I don't
know how likely that is). pgpcre will warn if you attempt to use a
value that was compiled by a different version of the library. If
that happens, it is advisable to recompile and rewrite all stored
pcre
values by doing something like
UPDATE ... SET pcre_col = pcre_col::text::pcre
(To be clear, storing regular expressions in tables is not a typical use. Normally, you store text in tables and match it against regular expressions provided by your application.)
Some possible advantages over the regular expression support built into PostgreSQL:
- richer pattern language, more familiar to Perl and Python programmers
- complete Unicode support
- saner operators and functions
Some disadvantages:
- no repeated matching (
'g'
flag) - no index optimization
You can workaround the lack of index optimization by manually augmenting queries like
column =~ '^foo'
with
AND column ~>=~ 'foo' AND column ~<~ 'fop'
and creating the appropriate text_pattern_ops
index as you would for the built-in pattern matching.