Regex

A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

The concept of regular expressions began in the 1950s, when the American mathematician Stephen Cole Kleene formalized the concept of a regular language. They came into common use with Unix text-processing utilities. Different syntaxes for writing regular expressions have existed since the 1980s, one being the POSIX standard and another, widely used, being the Perl syntax.

Regular expressions are used in search engines, in search and replace dialogs of word processors and text editors, in text processing utilities such as sed and AWK, and in lexical analysis. Most general-purpose programming languages support regex capabilities either natively or via libraries, including Python, C, C++, Java, Rust, OCaml, and JavaScript.

A quick reference for regular expressions (regex), including symbols, ranges, grouping, assertions and some sample patterns to get you started.

Introduction

This is a quick cheat sheet to getting started with regular expressions.

Regex in Python
Regex in JavaScript
Regex in PHP
Regex in Java
Regex in MySQL
Regex in Vim
Regex in Emacs
Online regex tester

Character Classes

Pattern	Description
`[abc]`	A single character of: a, b or c
`[^abc]`	A character except: a, b or c
`[a-z]`	A character in the range: a-z
`[^a-z]`	A character not in the range: a-z
`[0-9]`	A digit in the range: 0-9
`[a-zA-Z]`	A character in the range: a-z or A-Z
`[a-zA-Z0-9]`	A character in the range: a-z, A-Z or 0-9

Quantifiers

Pattern	Description
`a?`	Zero or one of a
`a*`	Zero or more of a
`a+`	One or more of a
`[0-9]+`	One or more of 0-9
`a{3}`	Exactly 3 of a
`a{3,}`	3 or more of a
`a{3,6}`	Between 3 and 6 of a
`a*`	Greedy quantifier
`a*?`	Lazy quantifier
`a*+`	Possessive quantifier

Common Metacharacters

^
{
+
<
[
*
)
>
.
(
|
$
\
?

Escape these special characters with \

Meta Sequences

Pattern	Description
`.`	Any single character
`\s`	Any whitespace character
`\S`	Any non-whitespace character
`\d`	Any digit, Same as [0-9]
`\D`	Any non-digit, Same as [^0-9]
`\w`	Any word character
`\W`	Any non-word character
`\X`	Any Unicode sequences, linebreaks included
`\C`	Match one data unit
`\R`	Unicode newlines
`\v`	Vertical whitespace character
`\V`	Negation of \v - anything except newlines and vertical tabs
`\h`	Horizontal whitespace character
`\H`	Negation of \h
`\K`	Reset match
`\n`	Match nth subpattern
`\pX`	Unicode property X
`\p{...}`	Unicode property or script category
`\PX`	Negation of \pX
`\P{...}`	Negation of \p
`\Q...\E`	Quote; treat as literals
`\k<name>`	Match subpattern `name`
`\k'name'`	Match subpattern `name`
`\k{name}`	Match subpattern `name`
`\gn`	Match nth subpattern
`\g{n}`	Match nth subpattern
`\g<n>`	Recurse nth capture group
`\g'n'`	Recurses nth capture group.
`\g{-n}`	Match nth relative previous subpattern
`\g<+n>`	Recurse nth relative upcoming subpattern
`\g'+n'`	Match nth relative upcoming subpattern
`\g'letter'`	Recurse named capture group `letter`
`\g{letter}`	Match previously-named capture group `letter`
`\g<letter>`	Recurses named capture group `letter`
`\xYY`	Hex character YY
`\x{YYYY}`	Hex character YYYY
`\ddd`	Octal character ddd
`\cY`	Control character Y
`[\b]`	Backspace character
`\`	Makes any character literal

Anchors

Pattern	Description
`\G`	Start of match
`^`	Start of string
`$`	End of string
`\A`	Start of string
`\Z`	End of string
`\z`	Absolute end of string
`\b`	A word boundary
`\B`	Non-word boundary

Substitution

Pattern	Description
`\0`	Complete match contents
`\1`	Contents in capture group 1
`$1`	Contents in capture group 1
`${foo}`	Contents in capture group `foo`
`\x20`	Hexadecimal replacement values
`\x{06fa}`	Hexadecimal replacement values
`\t`	Tab
`\r`	Carriage return
`\n`	Newline
`\f`	Form-feed
`\U`	Uppercase Transformation
`\L`	Lowercase Transformation
`\E`	Terminate any Transformation

Group Constructs

Pattern	Description
`(...)`	Capture everything enclosed
`(a\|b)`	Match either a or b
`(?:...)`	Match everything enclosed
`(?>...)`	Atomic group (non-capturing)
`(?\|...)`	Duplicate subpattern group number
`(?#...)`	Comment
`(?'name'...)`	Named Capturing Group
`(?<name>...)`	Named Capturing Group
`(?P<name>...)`	Named Capturing Group
`(?imsxXU)`	Inline modifiers
`(?(DEFINE)...)`	Pre-define patterns before using them

Assertions

Pattern	Description
`(?(1)yes\|no)`	Conditional statement
`(?(R)yes\|no)`	Conditional statement
`(?(R#)yes\|no)`	Recursive Conditional statement
`(?(R&name)yes\|no)`	Conditional statement
`(?(?=...)yes\|no)`	Lookahead conditional
`(?(?<=...)yes\|no)`	Lookbehind conditional

Lookarounds

Pattern	Description
`(?=...)`	Positive Lookahead
`(?!...)`	Negative Lookahead
`(?<=...)`	Positive Lookbehind
`(?<!...)`	Negative Lookbehind

Lookaround lets you match a group before (lookbehind) or after (lookahead) your main pattern without including it in the result.

Flags/Modifiers

Pattern	Description
`g`	Global
`m`	Multiline
`i`	Case insensitive
`x`	Ignore whitespace
`s`	Single line
`u`	Unicode
`X`	eXtended
`U`	Ungreedy
`A`	Anchor
`J`	Duplicate group names

Recurse

Pattern	Description
`(?R)`	Recurse entire pattern
`(?1)`	Recurse first subpattern
`(?+1)`	Recurse first relative subpattern
`(?&name)`	Recurse subpattern `name`
`(?P=name)`	Match subpattern `name`
`(?P>name)`	Recurse subpattern `name`

POSIX Character Classes

Character Class	Same as	Meaning
`[[:alnum:]]`	`[0-9A-Za-z]`	Letters and digits
`[[:alpha:]]`	`[A-Za-z]`	Letters
`[[:ascii:]]`	`[\x00-\x7F]`	ASCII codes 0-127
`[[:blank:]]`	`[\t ]`	Space or tab only
`[[:cntrl:]]`	`[\x00-\x1F\x7F]`	Control characters
`[[:digit:]]`	`[0-9]`	Decimal digits
`[[:graph:]]`	`[[:alnum:][:punct:]]`	Visible characters (not space)
`[[:lower:]]`	`[a-z]`	Lowercase letters
`[[:print:]]`	`[ -~] == [ [:graph:]]`	Visible characters
`[[:punct:]]`	[!"#$%&’()*+,-./:;<=>?@[]^_`{\|}~]	Visible punctuation characters
`[[:space:]]`	`[\t\n\v\f\r ]`	Whitespace
`[[:upper:]]`	`[A-Z]`	Uppercase letters
`[[:word:]]`	`[0-9A-Za-z_]`	Word characters
`[[:xdigit:]]`	`[0-9A-Fa-f]`	Hexadecimal digits
`[[:<:]]`	`[\b(?=\w)]`	Start of word
`[[:>:]]`	`[\b(?<=\w)]`	End of word

Control verb

Pattern	Description
`(*ACCEPT)`	Control verb
`(*FAIL)`	Control verb
`(*MARK:NAME)`	Control verb
`(*COMMIT)`	Control verb
`(*PRUNE)`	Control verb
`(*SKIP)`	Control verb
`(*THEN)`	Control verb
`(*UTF)`	Pattern modifier
`(*UTF8)`	Pattern modifier
`(*UTF16)`	Pattern modifier
`(*UTF32)`	Pattern modifier
`(*UCP)`	Pattern modifier
`(*CR)`	Line break modifier
`(*LF)`	Line break modifier
`(*CRLF)`	Line break modifier
`(*ANYCRLF)`	Line break modifier
`(*ANY)`	Line break modifier
`\R`	Line break modifier
`(*BSR_ANYCRLF)`	Line break modifier
`(*BSR_UNICODE)`	Line break modifier
`(*LIMIT_MATCH=x)`	Regex engine modifier
`(*LIMIT_RECURSION=d)`	Regex engine modifier
`(*NO_AUTO_POSSESS)`	Regex engine modifier
`(*NO_START_OPT)`	Regex engine modifier

Regex examples

Characters

Pattern	Matches
`ring`	Match ring springboard etc.
`.`	Match a, 9, + etc.
`h.o`	Match hoo, h2o, h/o etc.
`ring\?`	Match ring?
`$quiet$`	Match (quiet)
`c:\\windows`	Match c:\windows

Use \ to search for these special characters:
[ \ ^ $ . | ? * + ( ) { }

Alternatives

Pattern	Matches
`cat\|dog`	Match cat or dog
`id\|identity`	Match id or identity
`identity\|id`	Match id or identity

Order longer to shorter when alternatives overlap

Character classes

Pattern	Matches
`[aeiou]`	Match any vowel
`[^aeiou]`	Match a NON vowel
`r[iau]ng`	Match ring, wrangle, sprung, etc.
`gr[ae]y`	Match gray or grey
`[a-zA-Z0-9]`	Match any letter or digit

In [ ] always escape . \ ] and sometimes ^ - .

Shorthand classes

Pattern	Meaning
`\w`	"Word" character (letter, digit, or underscore)
`\d`	Digit
`\s`	Whitespace (space, tab, vtab, newline)
`\W, \D, or \S`	Not word, digit, or whitespace
`[\D\S]`	Means not digit or whitespace, both match
`[^\d\s]`	Disallow digit and whitespace

Occurrences

Pattern	Matches
`colou?r`	Match color or colour
`[BW]ill[ieamy's]*`	Match Bill, Willy, William's etc.
`[a-zA-Z]+`	Match 1 or more letters
`\d{3}-\d{2}-\d{4}`	Match a SSN
`[a-z]\w{1,7}`	Match a UW NetID

Greedy versus lazy

Pattern	Meaning
`* + {n,}` greedy	Match as much as possible
`<.+>`	Finds 1 big match in <b>bold</b>
`? +? {n,}?` lazy*	Match as little as possible
`<.+?>`	Finds 2 matches in <b>bold</b>

Scope

Pattern	Meaning
`\b`	"Word" edge (next to non "word" character)
`\bring`	Word starts with "ring", ex ringtone
`ring\b`	Word ends with "ring", ex spring
`\b9\b`	Match single digit 9, not 19, 91, 99, etc..
`\b[a-zA-Z]{6}\b`	Match 6-letter words
`\B`	Not word edge
`\Bring\B`	Match springs and wringer
`^\d*$`	Entire string must be digits
`^[a-zA-Z]{4,20}$`	String must have 4-20 letters
`^[A-Z]`	String must begin with capital letter
`[\.!?"')]$`	String must end with terminal puncutation

Modifiers

Pattern	Meaning
`(?i)`[a-z]*`(?-i)`	Ignore case ON / OFF
`(?s)`.*`(?-s)`	Match multiple lines (causes . to match newline)
`(?m)`^.*;$`(?-m)`	^ & $ match lines not whole string
`(?x)`	#free-spacing mode, this EOL comment ignored
`(?-x)`	free-spacing mode OFF
/regex/`ismx`	Modify mode for entire string

Groups

Pattern	Meaning
`(in\|out)put`	Match input or output
`\d{5}(-\d{4})?`	US zip code ("+ 4" optional)
Parser tries EACH alternative if match fails after group.

Can lead to catastrophic backtracking.

Back references

Pattern	Matches
`(to) (be) or not \1 \2`	Match to be or not to be
`([^\s])\1{2}`	Match non-space, then same twice more aaa, ...
`\b(\w+)\s+\1\b`	Match doubled words

Non-capturing group

Pattern	Meaning
`on(?:click\|load)`	Faster than: `on(click\|load)`

Use non-capturing or atomic groups when possible

Atomic groups

Pattern	Meaning
`(?>red\|green\|blue)`	Faster than non-capturing
`(?>id\|identity)\b`	Match id, but not identity

"id" matches, but \b fails after atomic group, parser doesn't backtrack into group to retry 'identity'

If alternatives overlap, order longer to shorter.

Lookaround

Pattern	Meaning
`(?= )`	Lookahead, if you can find ahead
`(?! )`	Lookahead,if you can not find ahead
`(?<= )`	Lookbehind, if you can find behind
`(?<! )`	Lookbehind, if you can NOT find behind
`\b\w+?(?=ing\b)`	Match warbling, string, fishing, ...
`\b(?!\w+ing\b)\w+\b`	Words NOT ending in "ing"
`(?<=\bpre).*?\b`	Match pretend, present, prefix, ...
`\b\w{3}(?<!pre)\w*?\b`	Words NOT starting with "pre"
`\b\w+(?<!ing)\b`	Match words NOT ending in "ing"

If-then-else

Match "Mr." or "Ms." if word "her" is later in string

M(?(?=.*?\bher\b)s|r)\.

requires lookaround for IF condition

RegEx in Python

Import the regular expressions module

import re

Examples

re.search()

>>> sentence = 'This is a sample string'
>>> bool(re.search(r'this', sentence, flags=re.I))
True
>>> bool(re.search(r'xyz', sentence))
False

re.findall()

>>> re.findall(r'\bs?pare?\b', 'par spar apparent spare part pare')
['par', 'spar', 'spare', 'pare']
>>> re.findall(r'\b0*[1-9]\d{2,}\b', '0501 035 154 12 26 98234')
['0501', '154', '98234']

re.finditer()

>>> m_iter = re.finditer(r'[0-9]+', '45 349 651 593 4 204')
>>> [m[0] for m in m_iter if int(m[0]) < 350]
['45', '349', '4', '204']

re.split()

>>> re.split(r'\d+', 'Sample123string42with777numbers')
['Sample', 'string', 'with', 'numbers']

re.sub()

>>> ip_lines = "catapults\nconcatenate\ncat"
>>> print(re.sub(r'^', r'* ', ip_lines, flags=re.M))
* catapults
* concatenate
* cat

re.compile()

>>> pet = re.compile(r'dog')
>>> type(pet)
<class '_sre.SRE_Pattern'>
>>> bool(pet.search('They bought a dog'))
True
>>> bool(pet.search('A cat crossed their path'))
False

Functions

Function	Description
`re.findall`	Returns a list containing all matches
`re.finditer`	Return an iterable of match objects (one for each match)
`re.search`	Returns a Match object if there is a match anywhere in the string
`re.split`	Returns a list where the string has been split at each match
`re.sub`	Replaces one or many matches with a string
`re.compile`	Compile a regular expression pattern for later use
`re.escape`	Return string with all non-alphanumerics backslashed

Flags

-	-	-
`re.I`	`re.IGNORECASE`	Ignore case
`re.M`	`re.MULTILINE`	Multiline
`re.L`	`re.LOCALE`	Make `\w`,`\b`,`\s` locale dependent
`re.S`	`re.DOTALL`	Dot matches all (including newline)
`re.U`	`re.UNICODE`	Make `\w`,`\b`,`\d`,`\s` unicode dependent
`re.X`	`re.VERBOSE`	Readable style

Regex in JavaScript

test()

let textA = 'I like APPles very much';
let textB = 'I like APPles';
let regex = /apples$/i
 
// Output: false
console.log(regex.test(textA));
 
// Output: true
console.log(regex.test(textB));

search()

let text = 'I like APPles very much';
let regexA = /apples/;
let regexB = /apples/i;
 
// Output: -1
console.log(text.search(regexA));
 
// Output: 7
console.log(text.search(regexB));

exec()

let text = 'Do you like apples?';
let regex= /apples/;
 
// Output: apples
console.log(regex.exec(text)[0]);
 
// Output: Do you like apples?
console.log(regex.exec(text).input);

match()

let text = 'Here are apples and apPleS';
let regex = /apples/gi;
 
// Output: [ "apples", "apPleS" ]
console.log(text.match(regex));

split()

let text = 'This 593 string will be brok294en at places where d1gits are.';
let regex = /\d+/g
 
// Output: [ "This ", " string will be brok", "en at places where d", "gits are." ] 
console.log(text.split(regex))

matchAll()

let regex = /t(e)(st(\d?))/g;
let text = 'test1test2';
let array = [...text.matchAll(regex)];

// Output: ["test1", "e", "st1", "1"]
console.log(array[0]);

// Output: ["test2", "e", "st2", "2"]
console.log(array[1]);

replace()

let text = 'Do you like aPPles?';
let regex = /apples/i
 
// Output: Do you like mangoes?
let result = text.replace(regex, 'mangoes');
console.log(result);

replaceAll()

let regex = /apples/gi;
let text = 'Here are apples and apPleS';

// Output: Here are mangoes and mangoes
let result = text.replaceAll(regex, "mangoes");
console.log(result);

Regex in PHP

Functions

-	-
`preg_match()`	Performs a regex match
`preg_match_all()`	Perform a global regular expression match
`preg_replace_callback()`	Perform a regular expression search and replace using a callback
`preg_replace()`	Perform a regular expression search and replace
`preg_split()`	Splits a string by regex pattern
`preg_grep()`	Returns array entries that match a pattern

preg_replace

$str = "Visit Microsoft!";
$regex = "/microsoft/i";

// Output: Visit QuickRef!
echo preg_replace($regex, "QuickRef", $str);

preg_match

$str = "Visit example.com";
$regex = "#example#i";

// Output: 1
echo preg_match($regex, $str);

preg_matchall

$regex = "/[a-zA-Z]+ (\d+)/";
$input_str = "June 24, August 13, and December 30";
if (preg_match_all($regex, $input_str, $matches_out)) {

    // Output: 2
    echo count($matches_out);

    // Output: 3
    echo count($matches_out[0]);

    // Output: Array("June 24", "August 13", "December 30")
    print_r($matches_out[0]);

    // Output: Array("24", "13", "30")
    print_r($matches_out[1]);
}

preg_grep

$arr = ["Jane", "jane", "Joan", "JANE"];
$regex = "/Jane/";

// Output: Jane
echo preg_grep($regex, $arr);

preg_split

$str = "Jane\tKate\nLucy Marion";
$regex = "@\s@";

// Output: Array("Jane", "Kate", "Lucy", "Marion")
print_r(preg_split($regex, $str));

Regex in Java

Styles

First way

Pattern p = Pattern.compile(".s", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("aS");  
boolean s1 = m.matches();  
System.out.println(s1);   // Outputs: true

Second way

boolean s2 = Pattern.compile("[0-9]+").matcher("123").matches();  
System.out.println(s2);   // Outputs: true

Third way

boolean s3 = Pattern.matches(".s", "XXXX");  
System.out.println(s3);   // Outputs: false

Pattern Fields

-	-
`CANON_EQ`	Canonical equivalence
`CASE_INSENSITIVE`	Case-insensitive matching
`COMMENTS`	Permits whitespace and comments
`DOTALL`	Dotall mode
`MULTILINE`	Multiline mode
`UNICODE_CASE`	Unicode-aware case folding
`UNIX_LINES`	Unix lines mode

Methods

Pattern

Pattern compile(String regex [, int flags])
boolean matches([String regex, ] CharSequence input)
String[] split(String regex [, int limit])
String quote(String s)

Matcher

int start([int group | String name])
int end([int group | String name])
boolean find([int start])
String group([int group | String name])
Matcher reset()

String

boolean matches(String regex)
String replaceAll(String regex, String replacement)
String[] split(String regex[, int limit])

There are more methods ...

Examples

Replace sentence:

String regex = "[A-Z\n]{5}$";
String str = "I like APP\nLE";

Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = p.matcher(str);

// Outputs: I like Apple!
System.out.println(m.replaceAll("pple!"));

Array of all matches:

String str = "She sells seashells by the Seashore";
String regex = "\\w*se\\w*";

Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);

List<String> matches = new ArrayList<>();
while (m.find()) {
    matches.add(m.group());
}

// Outputs: [sells, seashells, Seashore]
System.out.println(matches);

Regex in MySQL

Functions

Name	Description
`REGEXP`	Whether string matches regex
`REGEXP_INSTR()`	Starting index of substring matching regex (NOTE: Only MySQL 8.0+)
`REGEXP_LIKE()`	Whether string matches regex (NOTE: Only MySQL 8.0+)
`REGEXP_REPLACE()`	Replace substrings matching regex (NOTE: Only MySQL 8.0+)
`REGEXP_SUBSTR()`	Return substring matching regex (NOTE: Only MySQL 8.0+)

REGEXP

expr REGEXP pat

Examples

mysql> SELECT 'abc' REGEXP '^[a-d]';
1
mysql> SELECT name FROM cities WHERE name REGEXP '^A';
mysql> SELECT name FROM cities WHERE name NOT REGEXP '^A';
mysql> SELECT name FROM cities WHERE name REGEXP 'A|B|R';
mysql> SELECT 'a' REGEXP 'A', 'a' REGEXP BINARY 'A';
1   0

REGEXP_REPLACE

REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])

Examples

mysql> SELECT REGEXP_REPLACE('a b c', 'b', 'X');
a X c
mysql> SELECT REGEXP_REPLACE('abc ghi', '[a-z]+', 'X', 1, 2);
abc X

REGEXP_SUBSTR

REGEXP_SUBSTR(expr, pat[, pos[, occurrence[, match_type]]])

Examples

mysql> SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+');
abc
mysql> SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3);
ghi

REGEXP_LIKE

REGEXP_LIKE(expr, pat[, match_type])

Examples

mysql> SELECT regexp_like('aba', 'b+')
1
mysql> SELECT regexp_like('aba', 'b{2}')
0
mysql> # i: case-insensitive
mysql> SELECT regexp_like('Abba', 'ABBA', 'i');
1
mysql> # m: multi-line
mysql> SELECT regexp_like('a\nb\nc', '^b$', 'm');
1

REGEXP_INSTR

REGEXP_INSTR(expr, pat[, pos[, occurrence[, return_option[, match_type]]]])

Examples

mysql> SELECT regexp_instr('aa aaa aaaa', 'a{3}');
2
mysql> SELECT regexp_instr('abba', 'b{2}', 2);
2
mysql> SELECT regexp_instr('abbabba', 'b{2}', 1, 2);
5
mysql> SELECT regexp_instr('abbabba', 'b{2}', 1, 3, 1);
7

Files

regex.md

Latest commit

History

regex.md

File metadata and controls

Regex

Introduction

Character Classes

Quantifiers

Common Metacharacters

Meta Sequences

Anchors

Substitution

Group Constructs

Assertions

Lookarounds

Flags/Modifiers

Recurse

POSIX Character Classes

Control verb

Regex examples

Characters

Alternatives

Character classes

Shorthand classes

Occurrences

Greedy versus lazy

Scope

Modifiers

Groups

Back references

Non-capturing group

Atomic groups

Lookaround

If-then-else

RegEx in Python

Examples

re.search()

re.findall()

re.finditer()

re.split()

re.sub()

re.compile()

Functions

Flags

Regex in JavaScript

test()

search()

exec()

match()

split()

matchAll()

replace()

replaceAll()

Regex in PHP

Functions

preg_replace

preg_match

preg_matchall

preg_grep

preg_split

Regex in Java

Styles

First way

Second way

Third way

Pattern Fields

Methods

Pattern

Matcher

String

Examples

Regex in MySQL

Functions

REGEXP

Examples

REGEXP_REPLACE

Examples

REGEXP_SUBSTR