Skip to content

MohammadHosseinKv/Compiler-Design-Parser-w-JFlex-and-JavaCup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAS Parser Grammar & Implementation

Table of Contents


Project Overview

A Java‑based compiler front‑end for the educational SAS language, featuring:

  • Lexer built with JFlex to tokenize keywords, literals, operators, comments, and identifiers.
  • Parser generated by CUP (LALR(1)) with full support for declarations, classes, functions, control flow (if/else, loops, return), and typed expressions.
  • Semantic actions embedded in the grammar to perform on‑the‑fly symbol‑table management, type checking, and error reporting (with ANSI color output).
  • Build helper (Build.java) to automate JFlex/CUP code generation, compilation, and test execution, with both interactive and one‑shot modes.
  • Comprehensive test suite under tests/ covering valid and invalid SAS programs.

Ideal for compiler‑construction coursework or as a foundation for adding AST passes, code generation, and optimizations.

This document describes the CUP‑based parser for the SAS language, detailing every syntactic construct it recognizes. We enumerate the full grammar productions, explain their intent, and highlight how the parser handles operator precedence, error recovery in arithmetic expressions, and unsupported features.


Supported Constructs and Grammar

Declarations & Scopes

program          ::= decls EOF

decls            ::= /* empty */
                   | decls declaration

declaration      ::=
    /* Variable declaration (no init) */
    type IDENTIFIER (‘,’ IDENTIFIER)* ‘;’
  | /* Variable w/ init */
    type IDENTIFIER ‘=expr ‘;’
  | /* Function (opt. access modifier) */
    modifiers_opt type IDENTIFIER ‘(’ params_opt ‘)’ block
  | /* Class definition */
    modifiers_opt CLASS IDENTIFIER classBody

modifiers_opt    ::= /* empty */ | PUBLIC | PRIVATE | PROTECTED

params_opt       ::= /* empty */ | param_list
param_list       ::= type IDENTIFIER (‘,’ type IDENTIFIER)*

classBody        ::= ‘{’ enterScope decls ‘}’  /* tracks inner variables for cyclic-check */
  • type covers primitives (int, double, bool, char, string, void) and user-defined classes (by IDENTIFIER, checked against symbol table).
  • Scope actions (enterScope / exitScope) are embedded to manage nested declarations and detect redefinitions and cyclic class‑variable dependencies.

Control Flow

statement       ::=
    IF ‘(’ expr ‘)’ block (ELSE block)?
  | DO block WHILE ‘(’ expr ‘)’ ‘;’
  | WHILE ‘(’ expr ‘)’ block
  | FOR ‘(’ type IDENTIFIER ‘=expr ‘;’ relational_expression ;’ assignment_expression ‘)’ block
  | BREAK ‘;’
  | CONTINUE ‘;’
  | RETURN expr_opt ‘;’
  | /* Nested declarations are also allowed here */
    type IDENTIFIER (‘, ’ IDENTIFIER)* ‘;’
  | type IDENTIFIER ‘=expr ‘;’
  | assignment_expression ‘;’

block           ::= ‘{’ enterScope stmt_list ‘}’ exitScope
stmt_list       ::= /* empty */ | stmt_list statement
  • IF accepts optional ELSE.
  • DO…WHILE, WHILE, FOR loops introduce their own scopes.
  • FOR syntax: initializer must be a typed declaration and is scoped to the loop.
  • RETURN without an expression yields a special void‑typed ArgInfo.

Expressions

expr_opt             ::= /* empty */ | expr

expr                 ::= assignment_expression

assignment_expression::=
    IDENTIFIER ‘=’ assignment_expression
  | equality_expression

equality_expression  ::=
    relational_expression ( (EQEQ | NOTEQ) relational_expression )*
  | relational_expression

relational_expression::=
    additive_expression ( (LT | GT | LTEQ | GTEQ) additive_expression )*
  | additive_expression

additive_expression  ::=
    multiplicative_expression ( (PLUS | MINUS) multiplicative_expression )*
  | multiplicative_expression

multiplicative_expression ::=
    unary_expression ( (TIMES | DIV | MOD) unary_expression )*
  | unary_expression

unary_expression     ::=
    PLUSPLUS IDENTIFIER
  | MINUSMINUS IDENTIFIER
  | PLUS unary_expression
  | MINUS unary_expression
  | postfix_expression

postfix_expression   ::=
    primary_expression
  | IDENTIFIER PLUSPLUS
  | IDENTIFIER MINUSMINUS

primary_expression   ::=
    IDENTIFIER
  | INTEGER_LITERAL
  | FLOAT_LITERAL
  | CHAR_LITERAL
  | STRING_LITERAL
  | BOOL_LITERAL
  | IDENTIFIER ‘(’ expr_list_opt ‘)’        /* function call */
  | ‘(’ expr ‘)’

expr_list_opt        ::= /* empty */ | expr_list
expr_list            ::= expr (',' expr)* 
  • Function calls collect argument types in tempArgs and verify against SymbolInfo.paramTypes.
  • Literals produce ArgInfo(value, type).
  • Operators are strictly typed: +/- only on numeric types, no overload for string concatenation.

Some Examples of Unsupported Constructs

  • Object Creation: The grammar does not include a new keyword or new ClassName(…) construct.
  • String Concatenation: There is no + overload for STRING_LITERAL beyond arithmetic-attempting str1 + str2 yields a type error.
  • Ternary Operator: ?: is not recognized.
  • Lambda or Anonymous Functions: Not supported by the grammar.

Arithmetic Precedence & Error Handling

Operator Precedence

By nesting productions-multiplicative_expression inside additive_expression inside relational_expression, etc.-we enforce:

  1. Highest: prefix/postfix ++, --
  2. Next: *, /, %
  3. Then: +, -
  4. Then: <, >, <=, >=
  5. Then: ==, !=
  6. Lowest: assignment (=), right‑associative

CUP generates an LALR(1) parser that, given this unambiguous grammar, produces correct shift/reduce actions without additional %precedence directives.

Arithmetic Error Recovery

  • Type Mismatch: On invalid arithmetic (e.g. adding int + bool), the action calls:

    Util.expressionNotNumericError(left.type, right.type, "+", leftLine);
    RESULT = new ArgInfo(0.0, Util.DOUBLE_SYM);
  • Result Value: Even after an error, ArgInfo.value is set to 0.0 (double) so that subsequent expressions can continue parsing and evaluation without nulls or crashes.

  • Error Reporting: The parser prints a clear, colorized message with line context, then proceeds.

How to Run the Parser

This section describes two methods for building and running the SAS parser: manually and through the included Build.java helper.

1. Manually

1.1 Generate the Lexer

Use JFlex to generate the SASLexer.java file from your .jflex specification:

# Windows / cross-platform:
java -jar jflex-full-1.9.1.jar SASLexer.jflex

# Linux:
java -jar jflex-full-1.9.1.jar SASLexer.jflex

This will produce SASLexer.java in the current directory.


1.2 Generate the Parser

Use CUP to generate the SASParser.java and sym.java files from your .cup grammar:

# Windows / cross-platform:
java -jar lib/java-cup-11b.jar -parser SASParser SASParser.cup

# Linux:
java -jar lib/java-cup-11b.jar -parser SASParser SASParser.cup

Note

the -parser SASParser flag tells CUP to name the generated parser class SASParser.java.


1.3 Compile the Sources

Compile Main.java, SASLexer.java, SASParser.java, and sym.java. Include the CUP JAR on your classpath. You can optionally direct the .class files into a bin/ folder with -d bin:

# Windows:
javac -classpath ".;lib/java-cup-11b.jar" -d bin Main.java SASLexer.java SASParser.java sym.java

# Linux:
javac -classpath ".:lib/java-cup-11b.jar" -d bin Main.java SASLexer.java SASParser.java sym.java

Note

If you omit -d bin, the .class files will be generated alongside your .java files.


1.4 Run the Parser

Run Main on your .SAS test files. Again, include both bin/ and the CUP JAR on the classpath:

# Windows / cross-platform:
java -classpath "bin;lib/java-cup-11b.jar" Main tests/*.SAS

# Linux:
java -classpath "bin:lib/java-cup-11b.jar" Main tests/*.SAS

p_run

Important

All file paths in these commands are relative to the project root. If your terminal is elsewhere, use absolute paths or cd into the project folder first.


2. Run with the Build.java Helper

An interactive builder script (Build.java) automates cleaning, lexer & parser generation, compilation, and test execution. It assumes the project structure and file names shown here; modify Build.java if you need to change paths or filenames.

2.1 Compile the Builder

# Windows / cross-platform:
javac -d bin Build.java

# Linux:
javac -d bin Build.java

2.2 Run the Builder

# Windows / cross-platform:
java -classpath bin Build

# Linux:
java -classpath bin Build

This launches an interactive prompt with commands:

  • all → clean + gen_lexer + gen_parser + compile + run (with ANSI colors)
  • all_nc → same as all, but without ANSI colors
  • run → compile + run tests (with colors)
  • run_nc → compile + run tests (no colors)
  • clean → delete generated sources & binaries

Tip

If your terminal doesn’t support ANSI colors (or you prefer plain text), use the _nc variants for non‑colored output.

Colored output

p_brun

Non-colored output

p_brun_nc


About

This project is a Java‑based compiler front end for SAS, an educational Java/C++‑style language, combining a JFlex lexer, a CUP‑generated LALR(1) parser with built‑in symbol‑table and type checking, and a Build.java helper to automate lexer/parser generation and compilation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors