lexer & parser generator and grammar toolkit written in java
- accepts regex like grammar(EBNF)
- epsilon removal
- left recursion removal(direct and indirect)
- left factoring
- ebnf to bnf
- LR(0),LR(1),LALR(1) parser generator
- Table based parser & State->Method based parser
- Outputs AST/CST
- LL(1) recursive descent parser generator
- dot graph of NFA, DFA, LR(0), LR(1), LALR(1)
- DFA minimization
- lexer generator
- precedence tool(removes any precedence conflict)
Examples are in examples folder
//this is a line comment
/* this is a
multine comment */
to include another grammar use;
include "<grammar_name>"
e.g include "lexer.g"
options{
<option_name> = <value>
...
}
token{
<TOKEN_NAME> <seperator> <regex> <SEMICOLON>
//where seperator is one of ':' , '=' , '::=' , ':=' , '->'
}
e.g
token{
NUMBER: [0-9]+;
IDENT: [a-zA-Z_] [a-zA-Z0-9_]*;
}
prefixing token name with '#' makes that token fragment.So that it can be used as reference but no actual dfa generated for it
<RULE_NAME> <seperator> <regex> <SEMICOLON>
e.g
assign: left "=" right;
left: ident;
right: ident | literal;
r1 | r2 | r3
r1 r2 r3
r* = zero or more times(kleene star)
r+ = one or more times(kleene plus>
r? = zero or one time(optional)
(r) you can group complex regexes in tokens and rules
e.g a (b | c+)
use %empty, %epsilon or ε for epsilon
e.g rule: a (b | c | %epsilon);
place ranges or single chars inside brackets(without quote)
[start-end single]
e.g id: [a-zA-Z0-9_];
escape sequences also supported
e.g ws: [\u00A0\u000A\t];
negation e.g lc: "//" [^\n]*;
use double quotes for your strings
e.g stmt: "if" "(" expr ")" stmt;
strings in rules will be replaced with token references that are declared in token block
so in the example above the strings would need to be declared like;
token{
IF: "if";
LP: "(";
RP: ")";
}
in LR parsing you have to specify start rule with %start
e.g %start: expr;
%left <TOKEN_LIST>
%right <TOKEN_LIST>
precedence handled by picking the production declared previously
e.g E: E "*" E | E "+" E | NUM;
multiplication takes precedence over addition in the example aabove
skip tokens will be ignored by the parser so you can use it for comments and whitespaces
skip{
comment: "//" [^\n]*;
}