Support free-spacing mode in regular expressions (x mode) #8940
Labels
area-core-library
SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries.
core-l
P2
A bug or feature request we're likely to work on
type-enhancement
A request for a change that isn't a bug
This issue was originally filed by greg...@gmail.com
Support free-spacing mode in regular expressions (x mode), as supported in Java, Perl, Ruby, etc. This allows insignificant whitespace and comments to be added to regexps.
However it is not supported in Javascript, so patterns would need to be pre-parsed to strip out whitespace. For const strings this could even be done at compile time in dart2js.
Using this mode means that you get to write this:
var re = new RegExp(r'''
( # Capture 1: entire matched URL
(?:
[a-z][\w-]+: # URL protocol and colon
(?:
/{1,3} # 1-3 slashes
| # or
[a-z0-9%] # Single letter or digit or '%'
# (Trying not to match e.g. "URI::Escape")
)
| # or
www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
| # or
[a-z0-9.-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
)
(?: # One or more:
[^\s()<>]+ # Run of non-space, non-()<>
| # or
(([^\s()<>]+|(([^\s()<>]+)))) # balanced parens, up to 2 levels
)+
(?: # End with:
(([^\s()<>]+|(([^\s()<>]+)))) # balanced parens, up to 2 levels
| # or
[^\s`!()[]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars
)
)
''');
Instead of:
var re = RegExp(r'(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))))+(?:(([^\s()<>]+|(([^\s()<>]+))))|[^\s`!()[]{};:'".,<>?«»“”‘’]))');
Or a slightly better but still ugly option:
r'\b'
r'(' // Capture 1: entire matched URL
r'(?:'
r'[a-z][\w-]+:' // URL protocol and colon
r'(?:'
r'/{1,3}' // 1-3 slashes
r'|' // or
r'[a-z0-9%]' // Single letter or digit or '%'
// (Trying not to match e.g. "URI::Escape")
r')'
r'|' // or
r'www\d{0,3}[.]' // "www.", "www1.", "www2." … "www999."
r'|' // or
r'[a-z0-9.-]+[.][a-z]{2,4}/' // looks like domain name followed by a slash
r')'
r'(?:' // One or more:
r'[^\s()<>]+' // Run of non-space, non-()<>
r'|' // or
r'(([^\s()<>]+|(([^\s()<>]+))))' // balanced parens, up to 2 levels
r')+'
r'(?:' // End with:
r'(([^\s()<>]+|(([^\s()<>]+))))' // balanced parens, up to 2 levels
r'|' // or
r'[^\s`!()[]{};:'".,<>?«»“”‘’]' // not a space or one of these punct chars
r')'
r')');
http://www.regular-expressions.info/freespacing.html
http://daringfireball.net/2010/07/improved_regex_for_matching_urls
The text was updated successfully, but these errors were encountered: