This library provides a human-friendly way to write complex regular expression patterns.
The patterns generated by this library always work with Unicode code points, even if the target is JavaScript or C#.
var pattern1 = (RxPattern.Char("a") | RxPattern.Char("b")).many();
var rx1 : EReg = pattern1.build();
rx1.match("abaab"); // => truevar pattern2 = RxPattern.String("gr")
>> (RxPattern.Char("a") | RxPattern.Char("e"))
>> RxPattern.String("y");
var rx2 = pattern2.build();
rx2.match("grey"); // => true
rx2.match("gray"); // => truevar pattern3 = RxPattern.AtStart
>> RxPattern.String("colo")
>> RxPattern.Char("u").option()
>> RxPattern.String("r")
>> RxPattern.AtEnd;
var rx3 = pattern3.build();
rx3.match("color"); // => true
rx3.match("colour"); // => true
rx3.match("color\n"); // => false
rx3.match("\ncolour"); // => falsevar wordStart = GeneralCategory.Letter | RxPattern.Char("_");
var wordChar = wordStart | GeneralCategory.Number;
var word = wordStart >> wordChar.many();
var pattern4 = RxPattern.AtStart >> word >> RxPattern.AtEnd;
var rx4 = pattern4.build();
rx4.match("function"); // => true
rx4.match("int32_t"); // => true
rx4.match("\u3042"); // => true
rx4.match("24hours"); // => false- Neko VM (UTF-8, PCRE)
- PHP (UTF-8, PCRE)
- C++ (UTF-8, PCRE)
- Lua (UTF-8, PCRE)
- JavaScript (UTF-16, native RegExp)
- C# (UTF-16, System.Text.RegularExpressions.Regex)
- Java (UTF-16, java.util.regex)
- Python (UTF-32, re)
This library provides the following classes:
rxpattern.RxPatternrxpattern.CharSetrxpattern.GeneralCategory
RxPattern.AnyCodePoint : RxPattern- Matches any Unicode code point, i.e. U+0000 to U+10FFFF (may or may not excluding surrogates).
RxPattern.Char(c : String) : RxPattern- Matches a Unicode code point represented by the string
c. cmust consist of a single code point.
- Matches a Unicode code point represented by the string
RxPattern.String(s : String) : RxPattern- Matches a string.
- Special characters are escaped.
RxPattern.LineTerminator : RxPattern- Matches a line terminator.
- The following sequence / characters are treated as a line terminator:
- CR LF
- CR
- LF
- U+2028 LINE SEPARATOR
- U+2029 PARAGRAPH SEPARATOR
- TODO: Also include U+0085 NEL?
RxPattern.Empty : RxPattern- Matches an empty string.
The variables pattern1 and pattern2 are of type RxPattern.
pattern1 >> pattern2- Matches the sequence of
patterns.
- Matches the sequence of
pattern1 | pattern2- Matches
pattern1orpattern2.
- Matches
pattern1.then(pattern2) : RxPattern- Same as
pattern1 >> pattern2.
- Same as
pattern1.or(pattern2) : RxPattern- Same as
pattern1 | pattern2.
- Same as
RxPattern.sequence(patterns : Iterable<RxPattern>) : RxPattern- Applies
>>to the elements ofpatterns. - Returns
RxPattern.Emptyifpatternsis empty.
- Applies
RxPattern.choice(patterns : Iterable<RxPattern>) : RxPattern- Applies
|to the elements ofpatterns. - Returns
RxPattern.Neverifpatternsis empty.
- Applies
The variable pattern is of type RxPattern.
pattern.option() : RxPattern- Matches
patternor an empty string. - Equivalent to
pattern | RxPattern.Empty.
- Matches
pattern.many() : RxPattern- Matches zero or more repetition of
pattern.
- Matches zero or more repetition of
pattern.many1() : RxPattern- Matches one or more repetition of
pattern.
- Matches one or more repetition of
TODO: Add methods for the quantifiers {m}, {m,} and {m,n}.
RxPattern.AtStart : RxPattern- Matches at the start of the string.
RxPattern.AtEnd : RxPattern- Matches at the end of the string.
RxPattern.LookAhead(pattern : RxPattern) : RxPattern- Positive look ahead.
RxPattern.NotFollowedBy(pattern : RxPattern) : RxPattern- Negative look ahead.
RxPattern.Never : RxPattern- Never matches anything.
- Equivalent to
RxPattern.NotFollowedBy(RxPattern.Empty).
RxPattern.Group(pattern : RxPattern) : RxPattern- Creates a capture group.
Since non-capturing groups are automatically created when necessary, there is no function to explicitly create them.
The variable pattern is of type RxPattern.
pattern.build(options = "u") : EReg- Build an
ERegobject withpattern.
- Build an
pattern.get() : String- Get the pattern string.
RxPattern.buildEReg(pattern : RxPattern, options = "u") : EReg- Same as
pattern.build(options)
- Same as
RxPattern.getPattern(pattern : RxPattern) : String- Same as
pattern.get()
- Same as
The variable charset is of type CharSet.
RxPattern.CharSet(set : CharSet) : RxPatternRxPattern.NotInSet(set : CharSet) : RxPatternCharSet.empty() : CharSet- Returns an empty character set.
CharSet.singleton(c : String) : CharSet- Returns a character set with one element
c.
- Returns a character set with one element
CharSet.fromString(s : String) : CharSet- Returns a character set with elements from the string
s.
- Returns a character set with elements from the string
CharSet.intersection(a : CharSet, b : CharSet) : CharSetCharSet.union(a : CharSet, b : CharSet) : CharSetCharSet.difference(a : CharSet, b : CharSet) : CharSetcharset.has(c : String) : Bool- The string
cmust consist of a single code point.
- The string
charset.add(c : String) : Void- The string
cmust consist of a single code point.
- The string
charset.remove(c : String) : Void- The string
cmust consist of a single code point.
- The string
charset.hasCodePoint(x : Int) : Boolcharset.addCodePoint(x : Int) : Voidcharset.removeCodePoint(x : Int) : Voidcharset.codePointIterator() : Iterator<Int>charset.length : Int
This library provides RxPattern values corresponding Unicode general categories.
If Unicode properties (or, \p{} patterns) are available, they are used.
Otherwise, patterns generated from the data of Unicode 8.0.0 are used.
GeneralCategory.Letter : RxPatternGeneralCategory.Uppercase_Letter : RxPatternGeneralCategory.Lowercase_Letter : RxPatternGeneralCategory.Titlecase_Letter : RxPatternGeneralCategory.Cased_Letter : RxPatternGeneralCategory.Modifier_Letter : RxPatternGeneralCategory.Other_Letter : RxPatternGeneralCategory.Mark : RxPatternGeneralCategory.Nonspacing_Mark : RxPatternGeneralCategory.Spacing_Mark : RxPatternGeneralCategory.Enclosing_Mark : RxPatternGeneralCategory.Number : RxPatternGeneralCategory.Decimal_Number : RxPatternGeneralCategory.Letter_Number : RxPatternGeneralCategory.Other_Number : RxPatternGeneralCategory.Punctuation : RxPatternGeneralCategory.Connector_Punctuation : RxPatternGeneralCategory.Dash_Punctuation : RxPatternGeneralCategory.Open_Punctuaiton : RxPatternGeneralCategory.Close_Punctuation : RxPatternGeneralCategory.Initial_Punctuation : RxPatternGeneralCategory.Final_Punctuation : RxPatternGeneralCategory.Other_Punctuation : RxPatternGeneralCategory.Symbol : RxPatternGeneralCategory.Math_Symbol : RxPatternGeneralCategory.Currency_Symbol : RxPatternGeneralCategory.Modifier_Symbol : RxPatternGeneralCategory.Other_Symbol : RxPatternGeneralCategory.Separator : RxPatternGeneralCategory.Space_Separator : RxPatternGeneralCategory.Line_Separator : RxPatternGeneralCategory.Paragraph_Separator : RxPatternGeneralCategory.Other : RxPatternGeneralCategory.Control : RxPatternGeneralCategory.Format : RxPatternGeneralCategory.Private_Use : RxPattern
The terms "Disjunction", "Alternative", "Term" and "Atom" correspond to the rules in Typical Regular Expression Syntax.
The variable pattern is of type RxPattern.
RxPattern.Disjunction(s : String) : RxPattern- Returns a
RxPatternvalue with given pattern string.
- Returns a
RxPattern.Alternative(s : String) : RxPattern- Returns a
RxPatternvalue with given pattern string. - The string
smust be able to be used as an Alternative: that is, the patterns + "a"matchessand the charactera.
- Returns a
RxPattern.Term(s : String) : RxPattern- Returns a
RxPatternvalue with given pattern string. - The string
smust be able to be used as a Term.
- Returns a
RxPattern.Atom(s : String) : RxPattern- Returns a
RxPatternvalue with given pattern string. - The string
smust be able to be used as an Atom: that is, the patterns + "*"does mean zero or more repetition ofs.
- Returns a
pattern.toDisjunction() : String- Returns a pattern string that can be used as a Disjunction. This is same as
pattern.get().
- Returns a pattern string that can be used as a Disjunction. This is same as
pattern.toAlternative() : String- Return a pattern string that can be used as an Alternative.
- The string is surrounded by a non-capturing group if necessary.
pattern.toTerm() : String- Return a pattern string that can be used as a Term.
- The string is surrounded by a non-capturing group if necessary.
pattern.toAtom() : String- Return a pattern string that can be used as an Atom.
- The string is surrounded by a non-capturing group if necessary.
Pattern ::= Disjunction
Disjunction ::= Alternative
| Alternative "|" Disjunction
Alternative ::= ""
| Alternative Term
Term ::= Assertion
| Atom
| Atom Quantifier
Assertion ::= "^" | "$"
| "(?=" Disjunction ")"
| "(?!" Disjunction ")"
Quantifier ::= "*" | "+" | "?"
Atom ::= PatternCharacter
| "\" AtomEscape
| CharacterClass
| "(" Disjunction ")"
| "(?:" Disjunction ")"