- Overview
- Project Components
- Lexing in General
- Parsing in General
- Syntax Analysis in General
- Code Generation in General
- Usage
- Documentation
- License
This repository hosts the source code and documentation for the MicroJava Compiler, a Java-based compiler for the MicroJava programming language. MicroJava is a small, educational programming language, and this compiler is designed to convert MicroJava source code into bytecode that can be executed on the MicroJava Runtime Machine.
The MicroJava Compiler consists of four primary components:
-
Lexer:
- Implemented using jFlex, the lexer analyzes the source code and tokenizes it based on the MicroJava language specification. It identifies and classifies language constructs, such as keywords, operators, and identifiers. This lexer is a crucial initial step in the compilation process.
-
Parser:
- Implemented using CUP, the parser takes the tokenized output from the lexer and constructs an Abstract Syntax Tree (AST) following the MicroJava language specification. It enforces the language's syntax rules, ensuring that the source code adheres to the correct structure and order of statements and expressions.
-
Syntax Analysis:
- In this phase, the AST generated by the parser is analyzed to enforce the constraints of the MicroJava language. Syntax analysis ensures that the code follows the correct grammar, handles operator precedence, type checking, and validates the program's structure.
-
Code Generation:
- The final phase of the compiler is responsible for generating executable bytecode for the MicroJava Runtime Machine. This process translates the validated AST into instructions that can be executed on the MicroJava virtual machine, enabling the execution of MicroJava programs.
Lexing, also known as tokenization, is the process of breaking down the source code into a stream of tokens. Tokens are the smallest units of a programming language and include keywords, identifiers, operators, and literals. The lexer's role is to recognize and classify these tokens based on the language's grammar and rules.
Parsing is a crucial step in the compilation process that follows lexing. It involves the analysis of the tokenized source code to determine its structure and relationships between different elements. In this phase, a parser constructs an Abstract Syntax Tree (AST) that represents the program's syntax according to the language's grammar. It enforces the correct order of statements, expressions, and the overall program structure.
Syntax analysis, also known as parsing, is an essential part of any compiler. It ensures that the source code adheres to the rules and grammar of the target programming language. During this phase, the compiler constructs a structured representation of the code (such as an AST) and performs various checks to ensure correctness. Common tasks in syntax analysis include identifying and reporting syntax errors, handling operator precedence, and building a representation of the program that can be used for further processing and code generation.
Code generation is the final phase of a compiler, responsible for producing executable code from the parsed and analyzed source code. During this phase, the compiler generates code that can be executed on the target platform, such as machine code or bytecode. The generated code should adhere to the rules and specifications of the target platform and be semantically equivalent to the original source code.
The MicroJava Compiler can be used by students and educators as a learning tool for understanding the principles of compiler construction. To build and use the compiler, please refer to the instructions provided in the repository's documentation.
This appendix describes the MicroJava programming language. Microjava is similar to Java but much simpler.
- A MicroJava program starts with a keyword
program
and has static fields, static methods, and inner classes that can be used as (user) data types. - The main method of a MicroJava program is always called
main()
. When a MicroJava program is called, that method is executed. - Since:
- Integer, sign, and logical constants (
int
,char
,bool
). - Basic types:
int
,bool
,char
(ASCII). - Variables: global (static), local, class (fields).
- Variables of basic types contain values.
- Structured/reference types: one-dimensional arrays as in Java and inner classes with fields and methods.
- Variables of reference types represent references (they contain addresses that cannot be changed explicitly).
- Integer, sign, and logical constants (
- Static methods in the program.
- There is no garbage collector (allocated objects are only deallocated after the end of the program).
- There is class inheritance and polymorphism.
- There is a redefinition of methods.
- Methods of inner classes are bound to the instance and have an implicit parameter
this
(a reference to the instance of the class for which the method was called).- The reference "this" is implicitly declared in the methods of inner classes as the first formal argument of the reference type to the class to which the method belongs.
- Within instance methods, the field name refers to the instance field of the current object, assuming the field is not hidden by a method parameter. If it is hidden, we can access the instance field via
this.fieldName
.
- Pre-declared procedures:
ord
,chr
,len
. - Method
print
prints the values of all basic types. - Control structures include conditional branching (
if-else
) and cycle (do-while
).
## Program Structure
Program = "program" ident {ConstDecl | VarDecl | ClassDecl } "{" {MethodDecl} "}".
## Constant Declaration
ConstDecl = "const" Type ident "=" (numConst | charConst | boolConst) {, ident "=" (numConst | charConst | boolConst)} ";".
## Variable Declaration
VarDecl = Type ident ["[" "]"] {"," ident ["[" "]"]} ";".
## Class Declaration
ClassDecl = "class" ident ["extends" Type] "{" {VarDecl} ["{" {ConstructorDecl} {MethodDecl} "}"] "}".
## Constructor Declaration
ConstructorDecl = ident "(" [FormPars] ")" {VarDecl} "{" {Statement} "}. * za C nivo
## Method Declaration
MethodDecl = (Type | "void") ident "(" [FormPars] ")" {VarDecl} "{" {Statement} "}".
## Formal Parameters
FormPars = Type ident ["[" "]"] {"," Type ident ["[" "]"]}.
## Type Definition
Type = ident.
## Statements
Statement = DesignatorStatement ";" | "if" "(" Condition ")" Statement ["else" Statement] | "while" "(" Condition ")" Statement | "break" ";" | "continue" ";" | "return" [Expr] ";" | "read" "(" Designator ")" ";" | "print" "(" Expr ["," numConst] ")" ";" | Designator "." "foreach" "(" ident "=>" Statement ")" ";" | "{" {Statement} "}".
## Designator Statement
DesignatorStatement = Designator (Assignop Expr | "(" [ActPars] ")" | "++" | "‐‐") | "[" [Designator] {"," [Designator]}"]" "=" Designator.
## Actual Parameters
ActPars = Expr {"," Expr}.
## Conditions
Condition = CondTerm {"||" CondTerm}.
## Conditional Term
CondTerm = CondFact {"&&" CondFact}.
## Conditional Factor
CondFact = Expr [Relop Expr].
## Expressions
Expr = ["‐"] Term {Addop Term}.
## Terms
Term = Factor {Mulop Factor}.
## Factors
Factor = Designator ["(" [ActPars] ")"] | numConst | charConst | boolConst | "new" Type ( "[" Expr "]" | "(" [ActPars] ")" ) | "(" Expr ")".
## Designators
Designator = ident {"." ident | "[" Expr]"}.
## Labels
Label = ident.
## Assignment Operator
Assignop = "=".
## Relational Operators
Relop = "==" | "!=" | ">" | ">=" | "<" | "<=".
## Addition Operators
Addop = "+" | "‐".
## Multiplication Operators
Mulop = "*" | "/" | "%".
All defined terms in this document are underlined to emphasize their particular meaning. Definitions of those terms are given below.
Arrays and classes are of reference type.
- The type of an integer constant (e.g., 17) is int.
- The type of a character constant (e.g., 'x') is char.
- The type of a logical constant (e.g., true) is bool.
Two data types are equivalent
- if they have the same name, or
- if both are strings, and the types of their elements are equivalent.
The two data types are compatible
- if they are equivalent, or
- if one of them is a reference type, and the other is of type null.
Type heart
is compatible when assigned to type dst
- if they are
heart
anddst
equivalent, - if
dst
is reference type, andheart
is of the type null. - if
dst
is a reference to a base class andsrc
is a reference to a derived class
int
: type of all integer valueschar
: type of all character valuesbool
: boolean typenull
: the null value of a variable of type class or (character) string symbolically indicates a reference that does not point to any dataaeolian
: end of character line (corresponds to the character '\n');print(eol)
makes a transition to a new linechr
: standard method;chr(i)
performs the conversion of an integer expression and to character (char)ord
: standard method;ord(ch)
performs character conversion ch to an integer value (int)
Validity range (scope) represents the textual reach of a method or class. It extends from the beginning of the method or class definition to the closing brace at the end of that definition. A scope does not include names declared in scopes that are lexically nested within it. In a scope, the names declared within it and all scopes outside it are "seen". The assumption is that there is an artificial global scope (universe), for which the main program is local and which contains all pre-declared names.
- Name declaration in inner scope hides the declaration of the same name in the outer scope.
- Indirect recursion is not allowed, and each name must be declared before its first use.
- Predeclared names (e.g., int or char) can be redeclared in the inner scope (but this is not recommended).
- Each name in the program must be declared before first use.
- A name must not be declared multiple times within the same scope.
- There must be a named method in the program
main
. It must be declared as a void method with no arguments.
chr(s)
: must be an expression of type int.ord(c)
: must be of type char.len(s)
: must be a string or character string.
##Program Structure program = "program" + " ident {ConstDecl | VarDecl | ClassDecl | RecordDecl } {" + "{MethodDecl} }.";
constDecl = "const Type ident = (numConst | charConst | boolConst);";
- Terminal types numConst, charConst, or boolConst must be equivalent to Type.
- varDecl = "Type ident [][, ident [][]];";
classDecl = "class ident extends Type {" + "VarDecl {" + "ConstructorDecl MethodDecl}}";
- Type when deriving a class from another class, it must be an inner class of the main program.
methodDecl = "(Type | void) ident (FormPars) VarDecl {Statement}";
- If the method is not of type void, it must have a return statement within its body.
constructorDecl = "ident (FormPars) VarDecl {Statement}";
- A class constructor must have the same name as the class for which it is defined.
- There cannot be two constructors within the same class with the same formal parameters.
formPars = "Type ident [][] [, Type ident [][]]";
type = "ident";
- ident must indicate a data type.
designatorStatement = "Designator Assignop Expr;";
- Designator must denote a variable, array element, or field within an object.
- Non-terminal type Expr must be compatible at assignment with non-terminal type Designator.
incrementDecrementStatement = "Designator (++ | --);";
- Designator must denote a variable, array element, or field of an inner class object.
- Designator must be of type int.
methodCallStatement = "Designator (ActPars);";
- Designator must denote a non-static method of the inner class or a global function.
arrayElementAssignment = "[Designator, Designator, ...] = Designator";
- All Designator non-terminals to the left must denote a variable, array element, or field within an object.
- Designator to the right must represent a string.
- Array element types must be compatible at assignment.
breakStatement = "break;";
- The break statement can only be used inside while or foreach loops.
- Aborts execution immediately surrounding loops.
continueStatement = "continue;";
- The continue statement can only be used inside while or foreach loops.
- Terminates the current iteration immediately surrounding loops.
readStatement = "read(Designator);";
- Designator must denote a variable, array element, or field within an object.
- Designator must be of type int, char, or bool.
printStatement = "print(Expr [, numConst]);";
- Expr must be of type int, char, or bool.
returnStatement = "return [Expr];";
- Non-terminal type Expr must be equivalent to the return type of the current method/global function.
- If non-terminal Expr is missing, the current method must be declared void.
- It must not exist outside the body of (static) methods, i.e., global functions.
##If Statement ifStatement = "if (Condition) Statement [else Statement];";
- Conditional expression type Condition must be bool.
whileLoop = "while (Condition) Statement;";
- Conditional expression Condition must be of type bool.
- Checks the specified condition when entering and at the end of the loop body.
foreachLoop = "Designator.foreach(ident => Statement);";
- Designator must denote a string of arbitrary type.
- ident must be a local or global variable of the same type as the elements of the array described by the Designator.
- In each iteration of the loop, ident indicates the current element of the array.
actPars = "Expr [, Expr];";
- The number of formal and actual arguments of a method or constructor must be the same.
- The type of each actual argument must be compatible at assignment with the type of each formal argument.
condition = "CondTerm || CondTerm;";
condTerm = "CondFact && CondFact;";
condFact = "Expr Relop Expr;";
- The types of both expressions must be compatible.
- With variables of class or array type, only != and == can be used from the relational operators.
expr = "Term;";
negationExpr = "- Term;";
- Term must be of type int.
additionExpr = "Expr Addop Term;";
- Expr and Term must be of type int and compatible.
multiplicationExpr = "Term Mullop Factor;";
- Term and Factor must be of type int.
designatorFactor = "Designator | numConst | charConst | boolConst | (Expr);";
methodCallFactor = "Designator (ActPars);";
- Designator must denote a non-static method, a constructor, or a global function.
arrayCreationFactor = "new Type [Expr];";
- Expr must be of type int.
objectCreationFactor = "new Type (ActPars);";
- Type must denote an inner class (user-defined type).
designator1 = "Designator . ident;";
- Non-terminal type Designator must be an inner class (ident must be either a field or a method of an object marked with a non-terminal Designator).
designator2 = "Designator [Expr];";
- Non-terminal type Designator must be a string.
- Non-terminal type Expr must be an int.
assignop = "=";
- The assignment operator is right-associative.
relop = "== | != | > | >= | < | <="; addop = "+ | -";
- Operators are left-associative.
mulop = "* | / | %";
- Operators are left-associative.
- No more than 256 local variables may be used.
- No more than 65536 global variables may be used.
- A class cannot have more than 65536 fields.
This project is open source and available under the MIT License. Feel free to use, modify, and distribute the code, respecting the terms and conditions of the license.
For any questions or feedback, please don't hesitate to contact me.
Thank you for your interest in the MicroJava Compiler project. We hope it serves as a valuable resource for understanding compiler construction and the MicroJava programming language.