Name	Name	Last commit message	Last commit date
Latest commit History 282 Commits
.idea	.idea
doc	doc
examples	examples
src	src
.gitignore	.gitignore
README.md	README.md
build.sh	build.sh
pom.xml	pom.xml

Name

Last commit message

Last commit date

282 Commits

parserx

lexer & parser generator and grammar toolkit written in java

Features

accepts regex like grammar(EBNF)
epsilon removal
left recursion removal(direct and indirect)
left factoring
ebnf to bnf
LR(0),LR(1),LALR(1) parser generator
Table based parser & State->Method based parser
Outputs AST/CST
LL(1) recursive descent parser generator
dot graph of NFA, DFA, LR(0), LR(1), LALR(1)
DFA minimization
lexer generator
precedence tool(removes any precedence conflict)

Examples are in examples folder

Grammar Format

comments

//this is a line comment

/* this is a
multine comment */

top level

to include another grammar use;

include "<grammar_name>"

e.g include "lexer.g"

options

options{
  <option_name> = <value>
  ...
}

token definitions

token{

  <TOKEN_NAME> <seperator> <regex> <SEMICOLON>
  //where seperator is one of ':' , '=' , '::=' , ':=' , '->'
}

e.g

token{
  NUMBER: [0-9]+;
  IDENT: [a-zA-Z_] [a-zA-Z0-9_]*;
}

prefixing token name with '#' makes that token fragment.So that it can be used as reference but no actual dfa generated for it

rule definitions

<RULE_NAME> <seperator> <regex> <SEMICOLON>

e.g

assign: left "=" right;
left: ident;
right: ident | literal;

regex types

alternation

r1 | r2 | r3

sequence

r1 r2 r3

repetition

r* = zero or more times(kleene star)
r+ = one or more times(kleene plus>
r? = zero or one time(optional)

grouping

(r) you can group complex regexes in tokens and rules
e.g a (b | c+)

epsilon

use %empty, %epsilon or ε for epsilon
e.g rule: a (b | c | %epsilon);

ranges (token only)

place ranges or single chars inside brackets(without quote)
[start-end single]

e.g id: [a-zA-Z0-9_];

escape sequences also supported
e.g ws: [\u00A0\u000A\t];

negation e.g lc: "//" [^\n]*;

strings

use double quotes for your strings
e.g stmt: "if" "(" expr ")" stmt;

strings in rules will be replaced with token references that are declared in token block
so in the example above the strings would need to be declared like;

token{
  IF: "if";
  LP: "(";
  RP: ")";
}

start directive

in LR parsing you have to specify start rule with %start
e.g %start: expr;

assoc directives

%left <TOKEN_LIST>
%right <TOKEN_LIST>

precedence

precedence handled by picking the production declared previously e.g E: E "*" E | E "+" E | NUM;
multiplication takes precedence over addition in the example aabove

skip block

skip tokens will be ignored by the parser so you can use it for comments and whitespaces

skip{
  comment: "//" [^\n]*;
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

parserx

Features

Grammar Format

comments

top level

options

token definitions

rule definitions

regex types

alternation

sequence

repetition

grouping

epsilon

ranges (token only)

strings

start directive

assoc directives

precedence

skip block

About

Uh oh!

Releases 5

Packages

Uh oh!

Languages

mesut146/parserx

Folders and files

Latest commit

History

Repository files navigation

parserx

Features

Grammar Format

comments

top level

options

token definitions

rule definitions

regex types

alternation

sequence

repetition

grouping

epsilon

ranges (token only)

strings

start directive

assoc directives

precedence

skip block

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Languages

Packages