A Java‑based compiler front‑end for the educational SAS language, featuring:
- Lexer built with JFlex to tokenize keywords, literals, operators, comments, and identifiers.
- Parser generated by CUP (LALR(1)) with full support for declarations, classes, functions, control flow (
if/else, loops,return), and typed expressions. - Semantic actions embedded in the grammar to perform on‑the‑fly symbol‑table management, type checking, and error reporting (with ANSI color output).
- Build helper (
Build.java) to automate JFlex/CUP code generation, compilation, and test execution, with both interactive and one‑shot modes. - Comprehensive test suite under
tests/covering valid and invalid SAS programs.
Ideal for compiler‑construction coursework or as a foundation for adding AST passes, code generation, and optimizations.
This document describes the CUP‑based parser for the SAS language, detailing every syntactic construct it recognizes. We enumerate the full grammar productions, explain their intent, and highlight how the parser handles operator precedence, error recovery in arithmetic expressions, and unsupported features.
program ::= decls EOF
decls ::= /* empty */
| decls declaration
declaration ::=
/* Variable declaration (no init) */
type IDENTIFIER (‘,’ IDENTIFIER)* ‘;’
| /* Variable w/ init */
type IDENTIFIER ‘=’ expr ‘;’
| /* Function (opt. access modifier) */
modifiers_opt type IDENTIFIER ‘(’ params_opt ‘)’ block
| /* Class definition */
modifiers_opt CLASS IDENTIFIER classBody
modifiers_opt ::= /* empty */ | PUBLIC | PRIVATE | PROTECTED
params_opt ::= /* empty */ | param_list
param_list ::= type IDENTIFIER (‘,’ type IDENTIFIER)*
classBody ::= ‘{’ enterScope decls ‘}’ /* tracks inner variables for cyclic-check */typecovers primitives (int,double,bool,char,string,void) and user-defined classes (byIDENTIFIER, checked against symbol table).- Scope actions (
enterScope/exitScope) are embedded to manage nested declarations and detect redefinitions and cyclic class‑variable dependencies.
statement ::=
IF ‘(’ expr ‘)’ block (ELSE block)?
| DO block WHILE ‘(’ expr ‘)’ ‘;’
| WHILE ‘(’ expr ‘)’ block
| FOR ‘(’ type IDENTIFIER ‘=’ expr ‘;’ relational_expression ‘;’ assignment_expression ‘)’ block
| BREAK ‘;’
| CONTINUE ‘;’
| RETURN expr_opt ‘;’
| /* Nested declarations are also allowed here */
type IDENTIFIER (‘, ’ IDENTIFIER)* ‘;’
| type IDENTIFIER ‘=’ expr ‘;’
| assignment_expression ‘;’
block ::= ‘{’ enterScope stmt_list ‘}’ exitScope
stmt_list ::= /* empty */ | stmt_list statementIFaccepts optionalELSE.DO…WHILE,WHILE,FORloops introduce their own scopes.FORsyntax: initializer must be a typed declaration and is scoped to the loop.RETURNwithout an expression yields a specialvoid‑typedArgInfo.
expr_opt ::= /* empty */ | expr
expr ::= assignment_expression
assignment_expression::=
IDENTIFIER ‘=’ assignment_expression
| equality_expression
equality_expression ::=
relational_expression ( (EQEQ | NOTEQ) relational_expression )*
| relational_expression
relational_expression::=
additive_expression ( (LT | GT | LTEQ | GTEQ) additive_expression )*
| additive_expression
additive_expression ::=
multiplicative_expression ( (PLUS | MINUS) multiplicative_expression )*
| multiplicative_expression
multiplicative_expression ::=
unary_expression ( (TIMES | DIV | MOD) unary_expression )*
| unary_expression
unary_expression ::=
PLUSPLUS IDENTIFIER
| MINUSMINUS IDENTIFIER
| PLUS unary_expression
| MINUS unary_expression
| postfix_expression
postfix_expression ::=
primary_expression
| IDENTIFIER PLUSPLUS
| IDENTIFIER MINUSMINUS
primary_expression ::=
IDENTIFIER
| INTEGER_LITERAL
| FLOAT_LITERAL
| CHAR_LITERAL
| STRING_LITERAL
| BOOL_LITERAL
| IDENTIFIER ‘(’ expr_list_opt ‘)’ /* function call */
| ‘(’ expr ‘)’
expr_list_opt ::= /* empty */ | expr_list
expr_list ::= expr (',' expr)* - Function calls collect argument types in
tempArgsand verify againstSymbolInfo.paramTypes. - Literals produce
ArgInfo(value, type). - Operators are strictly typed:
+/-only on numeric types, no overload forstringconcatenation.
- Object Creation:
The grammar does not include a
newkeyword ornew ClassName(…)construct. - String Concatenation:
There is no
+overload forSTRING_LITERALbeyond arithmetic-attemptingstr1 + str2yields a type error. - Ternary Operator:
?:is not recognized. - Lambda or Anonymous Functions: Not supported by the grammar.
By nesting productions-multiplicative_expression inside additive_expression inside relational_expression, etc.-we enforce:
- Highest: prefix/postfix
++,-- - Next:
*,/,% - Then:
+,- - Then:
<,>,<=,>= - Then:
==,!= - Lowest: assignment (
=), right‑associative
CUP generates an LALR(1) parser that, given this unambiguous grammar, produces correct shift/reduce actions without additional %precedence directives.
-
Type Mismatch: On invalid arithmetic (e.g. adding
int+bool), the action calls:Util.expressionNotNumericError(left.type, right.type, "+", leftLine); RESULT = new ArgInfo(0.0, Util.DOUBLE_SYM);
-
Result Value: Even after an error,
ArgInfo.valueis set to0.0(double) so that subsequent expressions can continue parsing and evaluation without nulls or crashes. -
Error Reporting: The parser prints a clear, colorized message with line context, then proceeds.
This section describes two methods for building and running the SAS parser: manually and through the included Build.java helper.
Use JFlex to generate the SASLexer.java file from your .jflex specification:
# Windows / cross-platform:
java -jar jflex-full-1.9.1.jar SASLexer.jflex
# Linux:
java -jar jflex-full-1.9.1.jar SASLexer.jflexThis will produce SASLexer.java in the current directory.
Use CUP to generate the SASParser.java and sym.java files from your .cup grammar:
# Windows / cross-platform:
java -jar lib/java-cup-11b.jar -parser SASParser SASParser.cup
# Linux:
java -jar lib/java-cup-11b.jar -parser SASParser SASParser.cupNote
the -parser SASParser flag tells CUP to name the generated parser class SASParser.java.
Compile Main.java, SASLexer.java, SASParser.java, and sym.java. Include the CUP JAR on your classpath. You can optionally direct the .class files into a bin/ folder with -d bin:
# Windows:
javac -classpath ".;lib/java-cup-11b.jar" -d bin Main.java SASLexer.java SASParser.java sym.java
# Linux:
javac -classpath ".:lib/java-cup-11b.jar" -d bin Main.java SASLexer.java SASParser.java sym.javaNote
If you omit -d bin, the .class files will be generated alongside your .java files.
Run Main on your .SAS test files. Again, include both bin/ and the CUP JAR on the classpath:
# Windows / cross-platform:
java -classpath "bin;lib/java-cup-11b.jar" Main tests/*.SAS
# Linux:
java -classpath "bin:lib/java-cup-11b.jar" Main tests/*.SASImportant
All file paths in these commands are relative to the project root. If your terminal is elsewhere, use absolute paths or cd into the project folder first.
An interactive builder script (Build.java) automates cleaning, lexer & parser generation, compilation, and test execution. It assumes the project structure and file names shown here; modify Build.java if you need to change paths or filenames.
# Windows / cross-platform:
javac -d bin Build.java
# Linux:
javac -d bin Build.java# Windows / cross-platform:
java -classpath bin Build
# Linux:
java -classpath bin BuildThis launches an interactive prompt with commands:
all→ clean + gen_lexer + gen_parser + compile + run (with ANSI colors)all_nc→ same asall, but without ANSI colorsrun→ compile + run tests (with colors)run_nc→ compile + run tests (no colors)clean→ delete generated sources & binaries
Tip
If your terminal doesn’t support ANSI colors (or you prefer plain text), use the _nc variants for non‑colored output.