IMP Language Compiler
Goal: Compile the IMP language to the machine code of the virtual machine used in classes. The project covers the full pipeline: lexical, syntax, semantic analysis, and code generation.
Input/Output:
- Input: source file in the IMP language
- Output: machine code file for the virtual machine
Build: make
Run: ./kompilator <input_file> <output_file>
Some ref books used in process of building this project:
- Dragon book
- A Practical Approach to Compiler Construction - Des Watson
- Modern Compiler Implementation in C - Andrew Appel
Project structure:
- Makefile - build system
- lexer.l - lexical analyzer (Flex): tokenization of input
- parser.y - syntax analyzer (Bison): AST construction
- ast.h/ast.c - AST structures and functions
- symtab.h/symtab.c - symbol table and scopes
- semantic.h/semantic.c - semantic analysis
- codegen.h/codegen.c - machine code generator
- main.c - compilation flow control
Compilation flow:
- Lexer recognizes tokens.
- Parser builds the AST.
- Semantic analysis checks program correctness.
- Code generator emits machine code.
Optimizations:
- Fast constant generation (RST/SHL/INC).
- Multiplication/division by powers of 2 via shifts.
- O(log n) algorithms for
*,/,%(Russian multiplication, binary division/modulo). - Precomputing addresses for constant array indices.
Virtual machine (target):
- Model: 8 registers
ra..rh(natural numbers), program counterk, and memory cellsp_ifor i >= 0 (spec notes a technical limit i <= 2^62). - Execution: instructions are implicitly numbered from 0; the machine executes instruction
kuntilHALT. - Initial state: registers and memory are undefined;
kstarts at 0. - Lexical rules: whitespace is ignored;
#starts a comment that runs to end of line. - Errors: jumping outside the program or referring to a non-existent register is an error.
Instruction set (x in {a..h}, j in N):
| Instruction | Effect (summary) | Cost |
|---|---|---|
| READ | read input into ra |
100 |
| WRITE | output ra |
100 |
| LOAD j | ra = p_j |
50 |
| STORE j | p_j = ra |
50 |
| RLOAD x | ra = p_{r_x} |
50 |
| RSTORE x | p_{r_x} = ra |
50 |
| ADD x | ra = ra + r_x |
5 |
| SUB x | ra = max(ra - r_x, 0) |
5 |
| SWP x | swap ra and r_x |
5 |
| RST x | r_x = 0 |
1 |
| INC x | r_x = r_x + 1 |
1 |
| DEC x | r_x = max(r_x - 1, 0) |
1 |
| SHL x | r_x = 2 * r_x |
1 |
| SHR x | r_x = floor(r_x / 2) |
1 |
| JUMP j | k = j |
1 |
| JPOS j | if ra > 0 then k = j else k = k + 1 |
1 |
| JZERO j | if ra == 0 then k = j else k = k + 1 |
1 |
| CALL j | ra = k + 1, then k = j |
1 |
| RTRN | k = ra |
1 |
| HALT | stop execution | 0 |
Example (IMP -> VM code):
IMP program:
PROGRAM IS
n, m
IN
READ n;
m := n + 5;
WRITE m;
END
Generated VM code (output of this compiler for the program above):
JUMP 1
READ
STORE 0
LOAD 0
SWP c
RST a
INC a
SHL a
SHL a
INC a
ADD c
STORE 1
LOAD 1
WRITE
HALT