The MiniJava Compiler is a lightweight toolchain, comprised distinct modules-lexer, parser, semantic analyzer, and code generator, designed to compile a subset of MiniJava code into assembly code.
The lexer converts the input MiniJava source code into a stream of tokens, each representing a fundamental syntactic unit such as keywords, identifiers, literals, and operators.
The parser processes the token stream, constructing an Abstract Syntax Tree (AST) that represents the program's structure.
The semantic analyzer traverses the AST to ensure the program adheres to semantic rules such as type compatibility, scoping, and declaration usage.
The code generator translates the Abstract Syntax Tree (AST) into assembly instructions for execution, handling register allocation, control flow transition, optimized output, and runtime integration.
The project is organized as follows:
minijava-compiler/
├── include/ # Header files for declarations and definitions
│ ├── symbol_table.h
│ ├── shell-ast.h
|
├── src/
│ ├── codegen.c # Implementation for translating the AST into MIPS instructions
│ ├── grammar.y # YACC parser including the grammar rules for building the AST
│ ├── lex.l # Flex scanner for tokenizing the MiniJava code
│ ├── seman.c # Semantic analyzer
│ ├── string_hash_table.c # Hash table for storage and retrieval of identifiers and string constants
│ ├── symbol_table.c # Symbol table for tracking identifiers and their associated attributes
│ ├── tree.c # Base data strcuture for building the AST
|
├── test/ # MiniJava language code for testing the custom compiler
|
├── test.sh # Script comparing the custom compiler's output against the ground truth
├── codeGen.linux # A sample MiniJava compiler served as ground truth
├── spim.linux # MIPS assembly simulator executes the assembly code generated by compiler
├── trap.handler # Handles exceptions or interrupts that occur during the execution of MIPS
├── Makefile # Build script for compiling and linking the project
├── README.md
To compile the project and generate the codegen executable, run:
makeThis command compiles all project modules and produces a production-ready binary.
MiniJava source files can be placed in the test/ directory. To compile a file and generate MIPS assembly output, run:
make testThis command processes the selected MiniJava source, generating several outputs: the MIPS assembly code is saved as codegen.s, the runtime output (as executed by the SPIM simulator) is stored in codegen.out, and the Abstract Syntax Tree (AST) along with the symbol table are recorded in ast_symbol_table.txt.
For instance, consider a MiniJava program src1:
/* Ex1: Assignment statement */
program ex1;
class c1
{
declarations
int x=-1;
enddeclarations
method void main()
declarations
int x=4;
enddeclarations
{
if (x>=0)
{
System.println('x>=0');
};
}
}
Run the MiniJava compiler and save the outputs using the command:
./codegen < ./test/src1 > ast_symbol_table_1.txtThe symbol table and AST genrated from this program will be saved in ast_symbol_table_1.txt. This file includes two parts. The first section is the symbol table, which lists all identifiers discovered during compilation along with important attributes such as scope level, kind (e.g., class, variable, procedure), and type information. For instance, you can see how system-level definitions like system and println are flagged as predefined, while user-defined identifiers such as c1, x, and main are properly recorded with their corresponding scope levels. Following the symbol table, the file displays the AST printout, which represents the hierarchical structure of the MiniJava source code. This tree view details how each component (like class definitions, method calls, control structures, etc.) is organized and linked, serving as a valuable reference for debugging and understanding the compiler's internal representation.
********************************Symbol Table************************************
Name Nest-Level Tree-Node Predefined Kind Type Value Offset Dimension Argnum
1 system 0 yes class
2 println 1 yes procedure
3 c1 0 class
4 x 1 variable 1480276208
5 main 1 procedure
6 x 2 variable 1480276560
************* SYNTAX TREE PRINTOUT ***********
+-[DUMMYnode]
R-[ProgramOp]
| +-[STNode,3,"c1"]
| +-[ClassDefOp]
| | | +-[DUMMYnode]
| | | +-[CommaOp]
| | | | +-[STRINGNode,29,"'x>=0'"]
| | | +-[RoutineCallOp]
| | | | | +-[DUMMYnode]
| | | | | +-[SelectOp]
| | | | | | | +-[DUMMYnode]
| | | | | | +-[FieldOp]
| | | | | | +-[STNode,2,"println"]
| | | | +-[VarOp]
| | | | +-[STNode,1,"system"]
| | | +-[StmtOp]
| | | | +-[DUMMYnode]
| | | +-[CommaOp]
| | | | | +-[NUMNode,0]
| | | | +-[GEOp]
| | | | | +-[DUMMYnode]
| | | | +-[VarOp]
| | | | +-[STNode,6,"x"]
| | | +-[IfElseOp]
| | | | +-[DUMMYnode]
| | | +-[StmtOp]
| | | | +-[DUMMYnode]
| | | +-[BodyOp]
| | | | | +-[NUMNode,4]
| | | | | +-[CommaOp]
| | | | | | | +-[DUMMYnode]
| | | | | | +-[TypeIdOp]
| | | | | | +-[INTEGERTNode]
| | | | | +-[CommaOp]
| | | | | | +-[STNode,6,"x"]
| | | | | +-[DeclOp]
| | | | | | +-[DUMMYnode]
| | | | +-[BodyOp]
| | | | +-[DUMMYnode]
| | | +-[MethodOp]
| | | | | +-[DUMMYnode]
| | | | | +-[SpecOp]
| | | | | | +-[DUMMYnode]
| | | | +-[HeadOp]
| | | | +-[STNode,5,"main"]
| | +-[BodyOp]
| | | +-[DUMMYnode]
| | | +-[UnaryNegOp]
| | | | +-[NUMNode,1]
| | | +-[CommaOp]
| | | | | +-[DUMMYnode]
| | | | +-[TypeIdOp]
| | | | +-[INTEGERTNode]
| | | +-[CommaOp]
| | | | +-[STNode,4,"x"]
| | | +-[DeclOp]
| | | | +-[DUMMYnode]
| | +-[BodyOp]
| | +-[DUMMYnode]
+-[ClassOp]
+-[DUMMYnode]
The MIPS instructions converted from this MiniJava program, saved in code.s. It begins with a data segment that defines static data such as strings and memory allocation areas, followed by a text segment where the executable code resides. The assembly code includes directives for class initialization, variable setup, and control flow structures such as conditionals. Each operation from the source code—whether it’s a variable assignment, a conditional branch, or a method call—is translated into a sequence of MIPS instructions.
.data
Enter: .asciiz "
"
.text
j main
# class c1
c1.init:
addi $sp, $sp, -12
sw $ra, 0($sp)
sw $s1, 4($sp)
sw $fp, 8($sp)
move $fp, $sp
# init x
# c1.x
# init scalar
li $t1, 1
neg $t1, $t1
# init class var x
sw $t1, 0($s0)
move $sp, $fp
lw $ra, 0($sp)
lw $s1, 4($sp)
lw $fp, 8($sp)
add $sp, $sp, 12
jr $ra
.data
.align 4
c1.singleton: .space 4
.align 4
c1.addr: .word c1.singleton
.text
c1.main:
addi $sp, $sp, -12
sw $ra, 0($sp)
sw $s1, 4($sp)
sw $fp, 8($sp)
add $fp, $sp, $0
# init x
# init scalar
li $t1, 4
# init local var x 0
addi $sp, $sp, -4
sw $t1, 0($sp)
# if
# access local variable x
addi $t1, $fp, -4
lw $t1, 0($t1) #1
addi $sp, $sp, -4
sw $t1, 0($sp)
li $t1, 0
lw $t2, 0($sp)
addi $sp, $sp, 4
sge $t1, $t2, $t1
beq $t1, $0, L_1
.data
S_2: .asciiz "x>=0"
.text
li $v0, 4
la $a0, S_2
syscall
j L_1
L_1:
move $sp, $fp
lw $ra, 0($sp)
lw $s1, 4($sp)
lw $fp, 8($sp)
add $sp, $sp, 12
jr $ra
main:
la $s0, c1.singleton
jal c1.init
la $s0, c1.singleton
jal c1.main
li $v0, 10
syscall
This project is licensed under the MIT License. See the LICENSE file for details.