This library provides a human-friendly way to write complex regular expression patterns.
The patterns generated by this library always work with Unicode code points, even if the target is JavaScript or C#.
| Php | Python | Java{6/7}/JVM | C# | Js/Node/ES{5/6} | Interp | Neko | HashLink | Lua | CPP | Flash |
|---|---|---|---|---|---|---|---|---|---|---|
| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ➖ | ➖ | ➖ |
var pattern1 = (RxPattern.Char("a") | RxPattern.Char("b")).many();
var rx1:EReg = pattern1.build();
rx1.match("abaab"); // => truevar pattern2 = RxPattern.String("gr")
>> (RxPattern.Char("a") | RxPattern.Char("e"))
>> RxPattern.String("y");
var rx2 = pattern2.build();
rx2.match("grey"); // => true
rx2.match("gray"); // => truevar pattern3 = RxPattern.AtStart
>> RxPattern.String("colo")
>> RxPattern.Char("u").option()
>> RxPattern.String("r")
>> RxPattern.AtEnd;
var rx3 = pattern3.build();
rx3.match("color"); // => true
rx3.match("colour"); // => true
rx3.match("color\n"); // => false
rx3.match("\ncolour"); // => falsevar wordStart = Category.L | RxPattern.Char("_");
var wordChar = wordStart | Category.N;
var word = wordStart >> wordChar.many();
var pattern4 = RxPattern.AtStart >> word >> RxPattern.AtEnd;
var rx4 = pattern4.build();
rx4.match("function"); // => true
rx4.match("int32_t"); // => true
rx4.match("\u3042"); // => true
rx4.match("24hours"); // => falseThis library provides the following classes:
rxpattern.RxPatternrxpattern.CharSetrxpattern.Category
RxPattern.AnyCodePoint:RxPattern- Matches any Unicode code point, i.e. U+0000 to U+10FFFF (may or may not excluding surrogates).
RxPattern.Char(c:String):RxPattern- Matches a Unicode code point represented by the string
c. cmust consist of a single code point.
- Matches a Unicode code point represented by the string
RxPattern.String(s:String):RxPattern- Matches a string.
- Special characters are escaped.
RxPattern.LineTerminator:RxPattern- Matches a line terminator.
- The following sequence / characters are treated as a line terminator:
- CR LF
- CR
- LF
- U+2028 LINE SEPARATOR
- U+2029 PARAGRAPH SEPARATOR
- TODO: Also include U+0085 NEL?
RxPattern.Empty:RxPattern- Matches an empty string.
The variables pattern1 and pattern2 are of type RxPattern.
pattern1 >> pattern2- Matches the sequence of
patterns.
- Matches the sequence of
pattern1 | pattern2- Matches
pattern1orpattern2.
- Matches
pattern1.then(pattern2):RxPattern- Same as
pattern1 >> pattern2.
- Same as
pattern1.or(pattern2):RxPattern- Same as
pattern1 | pattern2.
- Same as
RxPattern.sequence(patterns:Iterable<RxPattern>):RxPattern- Applies
>>to the elements ofpatterns. - Returns
RxPattern.Emptyifpatternsis empty.
- Applies
RxPattern.choice(patterns:Iterable<RxPattern>):RxPattern- Applies
|to the elements ofpatterns. - Returns
RxPattern.Neverifpatternsis empty.
- Applies
The variable pattern is of type RxPattern.
pattern.option():RxPattern- Matches
patternor an empty string. - Equivalent to
pattern | RxPattern.Empty.
- Matches
pattern.many():RxPattern- Matches zero or more repetition of
pattern.
- Matches zero or more repetition of
pattern.many1():RxPattern- Matches one or more repetition of
pattern.
- Matches one or more repetition of
TODO: Add methods for the quantifiers {m}, {m,} and {m,n}.
RxPattern.AtStart:RxPattern- Matches at the start of the string.
RxPattern.AtEnd:RxPattern- Matches at the end of the string.
RxPattern.LookAhead(pattern:RxPattern):RxPattern- Positive look ahead.
RxPattern.NotFollowedBy(pattern:RxPattern):RxPattern- Negative look ahead.
RxPattern.Never:RxPattern- Never matches anything.
- Equivalent to
RxPattern.NotFollowedBy(RxPattern.Empty).
RxPattern.Group(pattern:RxPattern):RxPattern- Creates a capture group.
Since non-capturing groups are automatically created when necessary, there is no function to explicitly create them.
The variable pattern is of type RxPattern.
pattern.build(options = "u"):EReg- Build an
ERegobject withpattern.
- Build an
pattern.get():String- Get the pattern string.
RxPattern.buildEReg(pattern:RxPattern, options = "u"):EReg- Same as
pattern.build(options)
- Same as
RxPattern.getPattern(pattern:RxPattern):String- Same as
pattern.get()
- Same as
The variable charset is of type CharSet.
RxPattern.CharSet(set:CharSet):RxPatternRxPattern.NotInSet(set:CharSet):RxPatternCharSet.empty():CharSet- Returns an empty character set.
CharSet.singleton(c:String):CharSet- Returns a character set with one element
c.
- Returns a character set with one element
CharSet.fromString(s:String):CharSet- Returns a character set with elements from the string
s.
- Returns a character set with elements from the string
CharSet.intersection(a:CharSet, b:CharSet):CharSetCharSet.union(a:CharSet, b:CharSet):CharSetCharSet.difference(a:CharSet, b:CharSet):CharSetcharset.has(c:String):Bool- The string
cmust consist of a single code point.
- The string
charset.add(c:String):Void- The string
cmust consist of a single code point.
- The string
charset.remove(c:String):Void- The string
cmust consist of a single code point.
- The string
charset.hasCodePoint(x:Int):Boolcharset.addCodePoint(x:Int):Voidcharset.removeCodePoint(x:Int):Voidcharset.codePointIterator():Iterator<Int>charset.length:Int
This library provides RxPattern values corresponding Unicode general categories.
If Unicode properties (or, \p{} patterns) are available, they are used.
Otherwise, patterns generated from the latest supported Unicode data in skial/seri will be used.
abstract Category {
var Cc;
var Cf;
var Co;
var Cs;
var C;
var Ll;
var Lm;
var Lo;
var Lt;
var Lu;
var L;
var Mc;
var Me;
var Mn;
var M;
var Nd;
var Nl;
var No;
var N;
var Pc;
var Pd;
var Pe;
var Pf;
var Pi;
var Po;
var Ps;
var P;
var Sc;
var Sk;
var Sm;
var So;
var S;
var Zl;
var Zp;
var Zs;
var Z;
var Letter_UpperCase;
var Letter_LowerCase;
var Letter_TitleCase;
var Letter_Modifier;
var Mark_NonSpacing;
var Mark_SpacingCombining;
var Mark_Enclosing;
var Number_DecimalDigit;
var Number_Letter;
var Number_Other;
var Punctuation_Connector;
var Punctuation_Dash;
var Punctuation_Open;
var Punctuation_Close;
var Punctuation_InitialQuote;
var Punctuation_FinalQuote;
var Punctuation_Other;
var Symbol_Math;
var Symbol_Currency;
var Symbol_Modifier;
var Symbol_Other;
var Separator_Space;
var Separator_Line;
var Separator_Paragraph;
var Other_Control;
var Other_Format;
var Other_Surrogate;
var Other_PrivateUse;
}The terms "Disjunction", "Alternative", "Term" and "Atom" correspond to the rules in Typical Regular Expression Syntax.
The variable pattern is of type RxPattern.
RxPattern.Disjunction(s:String):RxPattern- Returns a
RxPatternvalue with given pattern string.
- Returns a
RxPattern.Alternative(s:String):RxPattern- Returns a
RxPatternvalue with given pattern string. - The string
smust be able to be used as an Alternative: that is, the patterns + "a"matchessand the charactera.
- Returns a
RxPattern.Term(s:String):RxPattern- Returns a
RxPatternvalue with given pattern string. - The string
smust be able to be used as a Term.
- Returns a
RxPattern.Atom(s:String):RxPattern- Returns a
RxPatternvalue with given pattern string. - The string
smust be able to be used as an Atom: that is, the patterns + "*"does mean zero or more repetition ofs.
- Returns a
pattern.toDisjunction():String- Returns a pattern string that can be used as a Disjunction. This is same as
pattern.get().
- Returns a pattern string that can be used as a Disjunction. This is same as
pattern.toAlternative():String- Return a pattern string that can be used as an Alternative.
- The string is surrounded by a non-capturing group if necessary.
pattern.toTerm():String- Return a pattern string that can be used as a Term.
- The string is surrounded by a non-capturing group if necessary.
pattern.toAtom():String- Return a pattern string that can be used as an Atom.
- The string is surrounded by a non-capturing group if necessary.
Pattern ::= Disjunction
Disjunction ::= Alternative
| Alternative "|" Disjunction
Alternative ::= ""
| Alternative Term
Term ::= Assertion
| Atom
| Atom Quantifier
Assertion ::= "^" | "$"
| "(?=" Disjunction ")"
| "(?!" Disjunction ")"
Quantifier ::= "*" | "+" | "?"
Atom ::= PatternCharacter
| "\" AtomEscape
| CharacterClass
| "(" Disjunction ")"
| "(?:" Disjunction ")"
The file/library regenerate.js is created by @mathiasbynens. Core functionality was ported to Haxe, see utf16/RangeUtil.hx.