This is an attempt to create a simple scanner (tokenizer) and parser that builds a parse tree from a user specified mathematical expression, where Pratt parsing is used as the main tree building algorithm.
The tokenizer recognizes the following as valid symbols:
- variables (regex
^[A-Za-z_]+[A-Za-z0-9_]$) - integral numeric literals (regex
^[0-9]+$) - operators (one of
{'+', '-', '*', '/', '%', '++', '--'})
Put simply, this section describes "meaningful" symbols to this program using regex
- The following context free grammar defines how the lexical symbols could be ordered to form meaningful mathematical expression.
- Non-terminals are lower-cased words like
expr,mult,unary - Terminals are
TOK_LITandTOK_VAR, which are defined with regex - The precedence of each rule will determine how "tall" they are in the resulting AST, in the following grammar, the rules/non-terminals that are closer to the bottom have higher precedence.
expr := expr '+' mult
| expr '-' mult
| mult
mult := mult '*' unary
| mult '/' unary
| mult '%' unary
| mult atom // implicit multiplication
| unary
unary := '+' unary
| '-' unary
| incdec
incdec := '++' incdec
| '--' incdec
| incdec '++'
| incdec '--'
| atom
atom := TOK_LIT | TOK_VAR | '(' expr ')'
TOK_VAR := ^[A-Za-z]+[_A-Za-z0-9]*$
TOK_LIT := ^[0-9]+$
- Note: implicit multiplication is supported via the syntax
(TOK_LIT|TOK_VAR) <space> (TOK_LIT|TOK_VAR)
This differs from regular programming language grammar for which above syntax could mean (TOK_LIT|TOK_VAR) initialization.
("+"|"-") < ("*"|"/"|"%") < ("++"|"--")
- They are right associative and have the highest precedence, meaning that expression
a b ++ ⇔ a * (b++)and++ a b ⇔ (++a) * b. - Their respective parse trees are:
and
"*" |__"a" |__"++" |__"b""*" |__"++" |__"a" |__"b"
- Mixing prefix
IncDec, postfixIncDec, and implicit multiplication together is strongly discouraged, however, this project did handle them.
The expression a ++ b obeys the grammar rules defined above, and there are 2 ways the ++
operator could be associated with variables a and b.
a * (++b)(a++) * b
This project chooses the second way, which says "any ++ or -- that is preceeded by either a
variable or literal is going to be treated as unary postfix increment/decrement expressions".
Consider the expression a++ --b, there are 2 possible groupings:
a++ --b ⇔ (a++) * (--b)a++ --b ⇔ ((a++)--) * b
-
This project goes with the second grouping when generating ASTs.
-
To reiterate, it is strongly discouraged to write expressions similar to above example, the correct way to do so is either using the conventional
*operator or parentheses to specify associativity.
-
The output is a simple visual display of the AST. Specifically, a pre-order tree-walk of the AST.
-
An ASTNode is either a
TOK_LIT, aTOK_VAR, or an operator surrounded by"". -
A node's child is placed below itself, preceed with a space and the symbol
|__.- notice that 2 children are siblings (on the same level of the entire AST) if their
|__symbols line up with each other. - As an example, the AST for expression
(a + b) / 2will look like
"/" |__"+" |__"a" |__"b" |__"2" - notice that 2 children are siblings (on the same level of the entire AST) if their
-
Implicit mulitiplication will be handled by manually making an extra
*node between the two operands.- Expressions
2 aanda 2will thus become
"*" "*" |__"2" |__"a" |__"a" |__"2" - Expressions
- If an
ExpressionTreeis built successfully, it will be printed onto a file namedparseTree.txtin this directory.
Assume gcc and Make are available on the machine.
- refer to
Makefilefor build, test, and clean instructions. - On successful builds, an executable named
expressionTreewill be present in the working directory. - if
valgrindis installed on the machine,make valgrindwill run the executable with valgrind to check for memory errors.
make
make test
- assume
valgrindis installed on machine
make valgrind
make clean
- see design.md