Skip to content

Supported Modifier Flags #1

Closed
Closed
@rbuckton

Description

@rbuckton

In the Oct, 2021 plenary, @michaelficarra asked that we outline and provide motivating examples for each flag we are considering as a supported modifier.

The flags currently under consideration are:

  • i — ignore-case
    • Rationale — Toggling ignore-case is especially useful when matching patterns with varying case sensitivity, or when parsing patterns provided via JSON configuration. Especially useful when working with complex Unicode character ranges.
    • Example — Match upper case ascii letter followed by upper or lower case ascii letter or '
      const re = /^[A-Z](?i)[a-z']+$/;
      re.test("O'Neill"); // true
      re.test("o'neill"); // false
      
      // alternatively (defaulting to ignore-case):
      const re2 = /^(?-i:[A-Z])[a-z']+$/i;
    • Example — Match word starting with D followed by word starting with D or d (from .NET documentation, see 1)
      const re = /\b(D\w+)(?ix)\s(d\w+)\b/g;
      const input = "double dare double Double a Drooling dog The Dreaded Deep";
      re.exec(input); // ["Drooling dog", "Drooling", "dog"]
      re.exec(input); // ["Dreaded Deep", "Dreaded", "Deep"]
  • m — multiline
    • Rationale — Flexibility in matching beginning-of-buffer vs. beginning-of-line or end-of-buffer vs. end-of-line in a complex pattern.
    • Example — Match a frontmatter block at the start of a file
      const re = /^---(?m)$((?:^(?!---$).*$)*)^---$/;
      re.test("---a"); // false
      re.test("---\n---"); // true
      re.test("---\na: b\n---"); // true
  • s — dot-all (i.e., "single line")
    • Rationale — Control over . matching semantics within a pattern.
    • Example
      const re = /a.c(?s:.)*x.z/;
      re.test("a\ncx\nz"); // flse
      re.test("abcdxyz"); // true
      re.test("aBc\nxYz"); // true
  • x — Extended Mode. This flag is proposed by https://github.com/tc39/proposal-regexp-x-mode
    • Rationale — Would allow control over significant whitespace handling in a pattern.
    • Example — Disabling x mode when composing a complex pattern:
      const idPattern = `[a-z]{2} \d{4}`; // space required
      const re = new RegExp(String.raw`
        # match the id
        (?<id>(?-x:${idPattern}))
        
        # match a separator
        :\s
        
        # match the value
        (?<value>\w+)
      `, "x");
      
      re.exec("aa0123: foo")?.groups; // undefined
      re.exec("aa 0123: foo")?.groups; // { id: "aa 0123", value: "foo" }

Flags likely too complex to support:

  • u — Unicode. This flag affects how a pattern is parsed, not how it is matched. Supporting it would likely require a cover grammar and additional static semantics.
  • v — Extended Unicode. This flag is proposed by https://github.com/tc39/proposal-regexp-set-notation as an extension of the u flag and would have the same difficulties.

Flags that will never be supported:

  • g — Global. This flag affects the index at which matching starts and not the matching behavior itself. Changing it mid pattern would have no effect.
  • y — Sticky. This flag affects the index at which matching starts and not the matching behavior itself. Changing it mid pattern would have no effect.
  • d — Indices. This flag affects the match result. Changing it mid pattern would have no effect.

Footnotes

  1. https://docs.microsoft.com/en-us/dotnet/standard/base-types/miscellaneous-constructs-in-regular-expressions#inline-options

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions