π A compilation of Regex syntax and resources for the Google DSC Regex Event
Watch a recording of the Regex presentation here!
Regex, or regular expressions, are patterns used to match strings. Regex is commonly used for searching/filtering strings for information, input validation, and web scraping. "Real-world" examples include everything from validating email addresses to formatting class names in a grades app.
Regex is incredibly powerful, but due to its seemingly unintelligible nature, it's also often intimidating to learn and difficult to remember.
For that reason, I've compiled a selection of the most helpful and commonly used regex syntax and some regex resources for your use below!
- "Balderdash" Basics (of Regex)
- "Flapdoodle" Flags
- "Gibberish" Characters
- "Bafflegab" Special Characters
- "Rigmarole" Ranges
- "Jargon" Quantifiers
- "Gobbledygook" Groups
- "Malarkey" Anchors
- "Mumbo Jumbo" Regex Resources
- Contributing
This repo contains a powerpoint presentation that can be viewed online here.
The lab file in this repo contains real-world practice problems and links to gamified resources for the DSC event. You are welcome to try your hand at some of them.
The Redoku folder of this repo contains the app "Redoku," a simple React Native application created for this event that allows you to hone your Regex skills through sudoku-like puzzles. This was heavily based on redoku, an awesome website with the same name. Thank you to @padolsey for granting permission to use the name "Redoku!" Download it below!
iOS | Android |
---|---|
- Regex has different flavors depending on the language you are using. Different engines support different features and some patterns have different meanings. While this resource attempts to cover as much as possible, there may be slight differences.
- UltimateRegexResource uses Javascript as the default regex engine. If there are differences between languages I attempt to note them. For a full review of what regex patterns are legal in each language, check out this awesome gist.
- Anywhere used below,
character
represents either a letter, digit, or symbol.
- Regular expressions start and end with "delimiters." For example, Javascript regex literals generally have "slash" characters
/
, and Python regex usually begins withr"
and ends with"
. (While Python doesn't necessarily have Regex literals perse, Regex is written more easily using raw strings to avoid worrying about string escapes). - Patterns return the first case-sensitive match they find by default.
Therefore: given the sample string I scream, you scream, we all SCREAM for ice cream
, /scream/
matches the first instance of "scream."
This behavior can be modified with flags.
Syntax | Flag | Behavior | Example |
---|---|---|---|
g |
global | Returns additional matches | /foo/g |
i |
insensitive | Allows case-insensitive matches | /foo/i |
x |
verbose | Ignore whitespace & allow comments | /foo/x |
u |
unicode | Expressions are treated as Unicode (UTF-16) | /foo/u |
s |
singleline | Treats entire string as one line (allows . to match newline) |
/foo/s |
m |
multiline | Start & end anchors now trigger on each line | /foo/m |
n |
nth match | Matches text returned by nth group | /foo/n |
Regex includes several flags that are appended to the end of the expression to change behavior. Using the string I scream, you scream, we all SCREAM for ice cream
, the updated regex /scream/gi
will now return scream scream SCREAM
.
Syntax | Character | Matches | Example String | Example Expression | Example Match |
---|---|---|---|---|---|
. |
any | Literally any character (except line break) | a-c1-3 |
a.c |
a-c |
\w |
word | ASCII character (Or Unicode character in Python & C#) | a-c1-3 |
\w-\w |
a-c |
\d |
digit | Digit 0-9 (Or Unicode digit in Python & C#) | a-c1-3 |
\d-\d |
1-3 |
\s |
whitespace | Space, tab, vertical tab, newline, carriage return (Or Unicode seperator in Python, C#, & JS) | a b |
a\sb |
a b |
\W |
NOT word | Anything \w does not match |
a-c1-3 |
\W-\W |
1-3 |
\D |
NOT digit | Anything \d does not match |
a-c1-3 |
\D-\D |
a-c |
\S |
NOT whitespace | Anything \s does not match |
a-c1-3 |
\S-\S |
a-c |
Syntax | Special Character | Matches | Example String | Example Expression | Example Match |
---|---|---|---|---|---|
\ |
escape | The following when preceding them: [{()}].*+?$^/\ |
)$[]*{ |
\[\] |
[] |
Syntax | Substitute | Behavior |
---|---|---|
\n |
newline | Insert a newline character |
\t |
tab | Insert a tab character |
\r |
carriage return | Insert a carriage return character |
\f |
form-feed | Insert a form feed character |
Syntax | Range | Matches | Example String | Example Expression | Example Match |
---|---|---|---|---|---|
[pog] |
word list | Either p , o , or g |
awesomePOSSUM123 |
[awesum]+ |
awes |
[^pog] |
NOT word list | Any character except p , o , or g |
awesomePOSSUM123 |
[^awesum]+ |
o |
[a-z] |
word range | Any character between a and z , inclusive |
awesomePOSSUM123 |
[a-z]+ |
awesome |
[^a-z] |
NOT word range | Any character not between a and z , inclusive |
awesomePOSSUM123 |
[^a-z]+ |
123 |
[0-9] |
digit range | Any character between 0 and 9 , inclusive |
awesomePOSSUM123 |
[0-9]+ |
123 |
[^0-9] |
NOT digit range | Any character not between 0 and 9 , inclusive |
awesomePOSSUM123 |
[^0-9]+ |
awesomePOSSUM |
[a-zA-Z] |
word range | Any character not between a and z , inclusive |
awesomePOSSUM123 |
[a-zA-Z]+ |
awesomePOSSUM |
[a-zA-Z] |
word range | Any character not between a and z , inclusive |
awesomePOSSUM123 |
[a-zA-Z]+ |
awesomePOSSUM |
There are also a few (mostly) semantically identical patterns in Golang and PHP. These do not appear to be supported in JS or Python:
Syntax | Range | Matches | Example String | Example Expression | Example Match |
---|---|---|---|---|---|
[[:alpha:]] |
alpha class | Any character between a and z , inclusive, not case sensitive |
Woodchuck could chuck 33 wood logs. |
[[:alpha:]]+ |
Woodchuck |
[[:digit:]] |
digit class | Any digit 0-9 | Woodchuck could chuck 33 wood logs. |
[[:digit:]]+ |
33 |
[[:alnum:]] |
alphanumeric class | Any character between a and z , inclusive, not case sensitive, and any digit 0-9 |
Woodchuck could chuck 33 wood logs. |
[[:alnum:]]+ |
Woodchuck |
[[:punct:]] |
punctuation class | Any of ?!.,:; |
Woodchuck could chuck 33 wood logs. |
[[:punct:]]+ |
. |
In some flavors of regex, the above are also called "Character Classes."
Syntax | Quantifier | Matches | Example String | Example Expression | Example Match |
---|---|---|---|---|---|
? |
optional | 0 or 1 of the preceding expression | ccc |
c? |
c |
{X} |
X | X of the preceding expression | ccc |
c{2} |
cc |
{X,} |
X+ | X or more of the preceding expression | ccc |
c{2,} |
ccc |
{X,Y} |
range | Between X and Y of the preceding expression | ccc |
c{1,3} |
ccc |
Beyond standard quantifiers, there are a few additional modifiers: greedy, lazy, and possessive.
Syntax | Quantifier | Matches | Example String | Example Expression | Example Match |
---|---|---|---|---|---|
* |
0+ greedy | 0 or more of the preceding expression, using as many chars as possible | abccc |
c* |
ccc |
+ |
1+ greedy | 1 or more of the preceding expression, using as many chars as possible | abccc |
c+ |
ccc |
*? |
0+ lazy | 0 or more of the preceding expression, using as few chars as possible | abccc |
c*? |
c |
+? |
1+ lazy | 1 or more of the preceding expression, using as few chars as possible | abccc |
c+? |
c |
*+ |
0+ possessive | 0 or more of the preceding expression, using as many chars as possible, without backtracking (Not supported in JS or PY) | abccc |
c*+ |
ccc |
++ |
1+ possessive | 1 or more of the preceding expression, using as many chars as possible, without backtracking (Not supported in JS or PY) | abccc |
c++ |
ccc |
Put simply, greedy quantifiers match as much as possible, lazy as little as possible and possessive as much as possible without backtracking.
What this means in practice is that possessive quantifiers will always return either the same match as greedy quantifiers or if backtracking is required they will return no match. Therefore, posessive quantifiers should be used when you know backtracking is not necessary, allowing increased performance.
Groups allow you to pull out specific parts of a match. For example, given the string Peter Piper picked a peck of pickled peppers
and the regex literal [peck]+ of (\w+)
, an additional "capturing group" group 1 is returned.
By default, the whole match begins at group 0, and then every group after is n where n is 1 + the previous capturing group.
Syntax | Group | Matches | Example String | Example Expression | Example Match |
---|---|---|---|---|---|
| |
alternate | Either the preceding or following expression | truly rural |
truly|rural |
truly |
(...) |
isolate | Everything enclosed; treats as separate capture group | truly rural |
truly (rural) |
truly , rural |
(?:...) |
include | Everything enclosed; enables using quantifiers on part of regex | truly ruralrural |
truly (?:rural)+ |
truly ruralrural |
(?|...) |
combine | Everything enclosed; treats all matches as same group | truly rural |
(?|(rural)|(truly)) |
truly |
(?>...) |
atomic | Longest possible string without backtracking | truly rural |
(?>rur) |
rur |
(?#...) |
comment | Everything enclosed; treats as comment and ignores | truly #rural |
truly (?#rural) |
truly |
Syntax | Anchor | Matches | Example String | Example Expression | Example Match |
---|---|---|---|---|---|
^ |
start | Start of string | she sells seashells |
^\w+ |
she |
$ |
end | End of string | she sells seashells |
\w+$ |
seashells |
\b |
word boundary | Between a character matched and not matched by \w |
she sells seashells |
s\b |
s |
\B |
NOT word boundary | Between two characters matched by \w |
she sells seashells |
\w+$ |
seashells |
There are additional anchors available that are unaffected by multiline mode m.
Syntax | Anchor | Matches | Example String | Example Expression | Example Match |
---|---|---|---|---|---|
\A |
multi-start | Start of string | she sees cheese |
\A\w+ |
she |
\Z |
multi-end | End of string | she sees cheese |
\w+\Z |
cheese |
\Z |
absolute end | Absolute end of string, ignoring trailing newlines | she sees cheese |
\w+\Z |
cheese |
- Regex101, an incredible testing utility for all flavors of Regex
- RegexOne, a great way to learn Regex through brief lessons
- Regexr, another way to test your expressions
- Rubular, a Ruby-based regex tester w/ quick reference
- Regex.Info, a plain but detailed guide to regex
- RexEgg, the self-proclaimed "world's most tyrannisaurical regex tutorial"
- CodeAcademy Regex Tutorial, a 1-hour course w/ certification
- SitePoint Learn Regex, a great tutorial of the fundamental concepts
- Regex Basics, as stated, with no deep knowledge required
- @ziishaned's Learn Regex, a repo that contains more info on Regex to learn it "the easy way"
- RegexHub, a collection of commonly used Regex
- "Greedy" vs "Lazy", a SO post that acts as a deep dive into their differences
- Difference between [] and () in Regex, a SO post that hopefully helps
- MIT Regex Printable PDF Cheatsheet, for those who need the physical copy
- Stanford Regex Printable PDF Cheatsheet, for those who prefer a pinker physical copy
- When/How Not To Use Regex, an article by the founder of SO that discusses what you'd expect from the title
- Awesome Regex Resources, a comprehensive list of Regex books, articles, and utilities far larger than this
- Regex Cheat Sheet, a visual Regex cheat sheet with samples and examples
- Fork UltimateRegexResource here
- Create a branch with your improvements (
git checkout -b improvement/fooBar
) - Commit your changes (
git commit -am 'Add some fooBar'
) - Push to the branch (
git push origin improvement/fooBar
) - Create a new Pull Request
Created by @GoldinGuy for the FAU Google DSC Regex Event.