Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support free-spacing mode in regular expressions (x mode) #8940

Open
DartBot opened this issue Mar 6, 2013 · 1 comment
Open

Support free-spacing mode in regular expressions (x mode) #8940

DartBot opened this issue Mar 6, 2013 · 1 comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. core-l P2 A bug or feature request we're likely to work on type-enhancement A request for a change that isn't a bug

Comments

@DartBot
Copy link

DartBot commented Mar 6, 2013

This issue was originally filed by greg...@gmail.com


Support free-spacing mode in regular expressions (x mode), as supported in Java, Perl, Ruby, etc. This allows insignificant whitespace and comments to be added to regexps.

However it is not supported in Javascript, so patterns would need to be pre-parsed to strip out whitespace. For const strings this could even be done at compile time in dart2js.

Using this mode means that you get to write this:

var re = new RegExp(r'''
( # Capture 1: entire matched URL
  (?:
    [a-z][\w-]+: # URL protocol and colon
    (?:
      /{1,3} # 1-3 slashes
      | # or
      [a-z0-9%] # Single letter or digit or '%'
                                           # (Trying not to match e.g. "URI::Escape")
    )
    | # or
    www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
    | # or
    [a-z0-9.-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
  )
  (?: # One or more:
    [^\s()&lt;&gt;]+ # Run of non-space, non-()<>
    | # or
    (([^\s()&lt;&gt;]+|(([^\s()&lt;&gt;]+)))) # balanced parens, up to 2 levels
  )+
  (?: # End with:
    (([^\s()&lt;&gt;]+|(([^\s()&lt;&gt;]+)))
) # balanced parens, up to 2 levels
    | # or
    [^\s`!()[]{};:'&quot;.,&lt;&gt;?«»“”‘’] # not a space or one of these punct chars
  )
)
''');

Instead of:

var re = RegExp(r'(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^\s()&lt;&gt;]+|(([^\s()&lt;&gt;]+|(([^\s()&lt;&gt;]+)))))+(?:(([^\s()&lt;&gt;]+|(([^\s()&lt;&gt;]+))))|[^\s`!()[]{};:'&quot;.,&lt;&gt;?«»“”‘’]))');

Or a slightly better but still ugly option:

r'\b'
r'(' // Capture 1: entire matched URL
  r'(?:'
    r'[a-z][\w-]+:' // URL protocol and colon
    r'(?:'
      r'/{1,3}' // 1-3 slashes
      r'|' // or
      r'[a-z0-9%]' // Single letter or digit or '%'
                                              // (Trying not to match e.g. "URI::Escape")
    r')'
    r'|' // or
    r'www\d{0,3}[.]' // "www.", "www1.", "www2." … "www999."
    r'|' // or
    r'[a-z0-9.-]+[.][a-z]{2,4}/' // looks like domain name followed by a slash
  r')'
  r'(?:' // One or more:
    r'[^\s()&lt;&gt;]+' // Run of non-space, non-()<>
    r'|' // or
    r'(([^\s()&lt;&gt;]+|(([^\s()&lt;&gt;]+))))' // balanced parens, up to 2 levels
  r')+'
  r'(?:' // End with:
    r'(([^\s()&lt;&gt;]+|(([^\s()&lt;&gt;]+)))
)' // balanced parens, up to 2 levels
    r'|' // or
    r'[^\s`!()[]{};:'&quot;.,&lt;&gt;?«»“”‘’]' // not a space or one of these punct chars
  r')'
r')');

http://www.regular-expressions.info/freespacing.html
http://daringfireball.net/2010/07/improved_regex_for_matching_urls

@sethladd
Copy link
Contributor

sethladd commented Mar 6, 2013

Removed Type-Defect label.
Added Type-Enhancement, Area-Library, Triaged labels.

@DartBot DartBot added Type-Enhancement area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. labels Mar 6, 2013
@kevmoo kevmoo added P2 A bug or feature request we're likely to work on type-enhancement A request for a change that isn't a bug and removed triaged labels Feb 29, 2016
@lrhn lrhn added the core-m label Aug 11, 2017
@floitschG floitschG added core-l and removed core-m labels Aug 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. core-l P2 A bug or feature request we're likely to work on type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

5 participants