Skip to content

pumarogie/simple-compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Compiler in Go

A feature-rich interpreter implemented in Go that supports dynamic typing, functions with closures, arrays, hash maps, strings, and more. This project demonstrates the complete architecture of a modern programming language implementation, from lexical analysis to evaluation.

Features

Data Types

  • Integers: 42, 100, -5
  • Strings: "Hello", "World", "Escaped \"quotes\""
  • Booleans: true, false
  • Arrays: [1, 2, 3], ["a", "b", "c"]
  • Hash Maps: {"key": "value", "age": 25}
  • Functions: First-class functions with closures

Operators

Arithmetic Operators

  • Addition: + (also string concatenation)
  • Subtraction: -
  • Multiplication: *
  • Division: /

Comparison Operators

  • Less than: < (works with numbers and strings)
  • Greater than: > (works with numbers and strings)
  • Less than or equal: <=
  • Greater than or equal: >=
  • Equal: ==
  • Not equal: !=

Logical Operators

  • AND: &&
  • OR: ||
  • NOT: !

Unary Operators

  • Negation: -x
  • Logical NOT: !x

Index Operator

  • Array indexing: array[0]
  • Hash map access: hash["key"]

Language Constructs

Variable Assignment

x = 10;
name = "Alice";
numbers = [1, 2, 3];
person = {"name": "Bob", "age": 30};

Functions

// Function declaration
function add(x, y) {
    return x + y
}

// Anonymous functions
multiply = function(x, y) { return x * y };

// Higher-order functions
function makeAdder(x) {
    return function(y) { return x + y }
}
addFive = makeAdder(5);
result = addFive(10); // Returns 15

Closures

Functions capture their lexical environment:

function counter() {
    count = 0;
    return function() {
        count = count + 1;
        return count
    }
}
c = counter();
print(c()); // 1
print(c()); // 2

Recursion

function factorial(n) {
    if (n <= 1) {
        return 1
    } else {
        return n * factorial(n - 1)
    }
}
print(factorial(5)); // 120

Arrays

numbers = [1, 2, 3, 4, 5];
print(numbers[0]);      // 1
print(length(numbers)); // 5

// Array iteration
sum = 0;
i = 0;
while (i < length(numbers)) {
    sum = sum + numbers[i];
    i = i + 1
};

Hash Maps

person = {
    "name": "Alice",
    "age": 30,
    "city": "New York"
};
print(person["name"]); // Alice

// Nested structures
data = {
    "users": [
        {"name": "Bob", "id": 1},
        {"name": "Carol", "id": 2}
    ],
    "count": 2
};

Strings

greeting = "Hello, " + "World!";
print(greeting);                    // Hello, World!
print(length(greeting));            // 13

// String escape sequences
message = "Line 1\nLine 2\tTabbed";
quoted = "She said \"Hello\"";

If/Else Statements

if (x > 10) {
    print("Greater than 10")
} else {
    print("Less than or equal to 10")
}

While Loops

counter = 0;
while (counter < 10) {
    print(counter);
    counter = counter + 1
}

Print Statement

print(42);
print("Hello World");
print([1, 2, 3]);
print({"key": "value"});

Built-in Functions

length()

Returns the length of strings, arrays, or hash maps:

print(length("hello"));           // 5
print(length([1, 2, 3]));         // 3
print(length({"a": 1, "b": 2})); // 2

Project Structure

simple-compiler/
├── cmd/
│   └── compiler/
│       └── main.go           # Entry point with comprehensive demo
├── internal/
│   ├── token/
│   │   └── token.go          # Token definitions and keywords
│   ├── lexer/
│   │   └── lexer.go          # Lexical analysis with string support
│   ├── ast/
│   │   └── ast.go            # AST nodes for all language features
│   ├── parser/
│   │   └── parser.go         # Recursive descent parser with Pratt parsing
│   └── evaluator/
│       ├── environment.go    # Lexically scoped variable storage
│       └── evaluator.go      # Dynamic tree-walking interpreter
├── test/
│   └── integration_test.go   # Comprehensive test suite
├── .gitignore
├── go.mod
└── README.md

Getting Started

Prerequisites

  • Go 1.22.4 or higher

Installation

Clone the repository:

git clone https://github.com/phillips/simple-compiler.git
cd simple-compiler

Usage

Run the demo program:

go run ./cmd/compiler

This runs a comprehensive demo showcasing all language features including functions, closures, recursion, arrays, hash maps, and string operations.

Build the compiler:

go build -o compiler ./cmd/compiler
./compiler

Run tests:

go test ./test/...

Example Programs

Basic Operations

// Variables and arithmetic
x = 10;
y = 20;
sum = x + y;
print(sum);           // 30

// String operations
greeting = "Hello, " + "World!";
print(greeting);      // Hello, World!

Functions and Closures

// Simple function
function add(a, b) {
    return a + b
}
print(add(5, 3));     // 8

// Closure example
function makeMultiplier(factor) {
    return function(x) {
        return x * factor
    }
}
double = makeMultiplier(2);
triple = makeMultiplier(3);
print(double(5));     // 10
print(triple(5));     // 15

Data Structures

// Arrays
numbers = [1, 2, 3, 4, 5];
squares = [];
i = 0;
while (i < length(numbers)) {
    squares = squares + [numbers[i] * numbers[i]];
    i = i + 1
};
print(squares);       // [1, 4, 9, 16, 25]

// Hash maps
person = {
    "name": "Alice",
    "age": 30,
    "hobbies": ["reading", "coding"]
};
print(person["name"]);           // Alice
print(person["hobbies"][0]);     // reading

Recursive Fibonacci

function fibonacci(n) {
    if (n <= 1) {
        return n
    } else {
        return fibonacci(n - 1) + fibonacci(n - 2)
    }
}

i = 0;
while (i <= 10) {
    print(fibonacci(i));
    i = i + 1
}

Higher-Order Functions

// Map-like functionality
function applyToAll(arr, f) {
    result = [];
    i = 0;
    while (i < length(arr)) {
        result = result + [f(arr[i])];
        i = i + 1
    };
    return result
}

numbers = [1, 2, 3, 4, 5];
doubled = applyToAll(numbers, function(x) { return x * 2 });
print(doubled);       // [2, 4, 6, 8, 10]

Architecture

1. Lexer (Tokenization)

The lexer (internal/lexer/lexer.go) converts source code into tokens:

  • Recognizes keywords (function, return, if, else, while, print, true, false)
  • Handles string literals with escape sequences (\n, \t, \", \\)
  • Identifies all operators and delimiters
  • Tracks line and column positions for error reporting

2. Parser (Syntax Analysis)

The parser (internal/parser/parser.go) builds an AST from tokens:

  • Uses recursive descent with Pratt parsing for expressions
  • Handles function definitions and calls
  • Parses array and hash map literals
  • Supports index expressions for both arrays and hash maps
  • Distinguishes between code blocks and hash literals

3. AST (Abstract Syntax Tree)

The AST (internal/ast/ast.go) defines nodes for:

  • Literals: IntegerLiteral, StringLiteral, BooleanLiteral, ArrayLiteral, HashLiteral, FunctionLiteral
  • Expressions: InfixExpression, PrefixExpression, CallExpression, IndexExpression
  • Statements: Assignment, IfStatement, WhileStatement, PrintStatement, ReturnStatement, BlockStatement

4. Evaluator (Execution)

The evaluator (internal/evaluator/evaluator.go) executes the AST:

  • Dynamic typing with interface{}
  • Function values with captured environments (closures)
  • Array and hash map operations
  • Built-in function support
  • Recursive function calls
  • Short-circuit evaluation for logical operators

5. Environment (Symbol Table)

The environment (internal/evaluator/environment.go) manages variable scope:

  • Lexical scoping with environment chaining
  • Closure support through captured environments
  • Nested scope management for functions and blocks

Language Grammar

Program        → Statement*
Statement      → FunctionDecl | IfStmt | WhileStmt | PrintStmt |
                 ReturnStmt | Assignment | ExprStmt | Block
FunctionDecl   → "function" IDENTIFIER "(" Parameters? ")" Block
Parameters     → IDENTIFIER ("," IDENTIFIER)*
IfStmt         → "if" "(" Expression ")" Block ("else" Block)?
WhileStmt      → "while" "(" Expression ")" Block
PrintStmt      → "print" "(" Expression ")"
ReturnStmt     → "return" Expression
Assignment     → IDENTIFIER "=" Expression ";"?
ExprStmt       → Expression ";"?
Block          → "{" Statement* "}"

Expression     → Logical
Logical        → Comparison (("&&" | "||") Comparison)*
Comparison     → Term (("==" | "!=" | "<" | ">" | "<=" | ">=") Term)*
Term           → Factor (("+" | "-") Factor)*
Factor         → Unary (("*" | "/") Unary)*
Unary          → ("!" | "-") Unary | Postfix
Postfix        → Primary (CallExpr | IndexExpr)*
CallExpr       → "(" Arguments? ")"
IndexExpr      → "[" Expression "]"
Arguments      → Expression ("," Expression)*
Primary        → INTEGER | STRING | "true" | "false" | IDENTIFIER |
                 FunctionLit | ArrayLit | HashLit | "(" Expression ")"
FunctionLit    → "function" "(" Parameters? ")" Block
ArrayLit       → "[" (Expression ("," Expression)*)? "]"
HashLit        → "{" (HashPair ("," HashPair)*)? "}"
HashPair       → Expression ":" Expression

Operator Precedence (Lowest to Highest)

  1. OR (||)
  2. AND (&&)
  3. Equality (==, !=)
  4. Comparison (<, >, <=, >=)
  5. Addition/Subtraction (+, -)
  6. Multiplication/Division (*, /)
  7. Prefix (!, -)
  8. Call/Index ((), [])

Type System

The language uses dynamic typing with automatic type coercion where appropriate:

  • Integers: 64-bit signed integers
  • Strings: UTF-8 strings with escape sequence support
  • Booleans: true or false
  • Arrays: Heterogeneous collections (can mix types)
  • Hash Maps: String, integer, or boolean keys with any value type
  • Functions: First-class values with closure support
  • Null: Represented as nil internally

Truthiness

Values are considered "falsy" in boolean contexts:

  • false (boolean)
  • 0 (integer)
  • "" (empty string)
  • nil (null/undefined)

All other values are "truthy".

Error Handling

The compiler provides detailed error messages:

Parse Errors

[Line 5, Column 3] expected ')' after function parameters
[Line 10, Column 15] unexpected token in array literal

Runtime Errors

error: division by zero
error: undefined variable: x
error: not a function: INTEGER
error: wrong number of arguments: expected 2, got 3
error: array index out of bounds: 10
error: unusable as hash key: ARRAY

Testing

The project includes comprehensive tests covering:

  • Lexer: tokenization of all language features
  • Parser: AST construction and error cases
  • Evaluator: all data types, operations, and control flow
  • Integration: complete programs with complex features

Run tests:

go test -v ./test/...

Performance Characteristics

  • Lexer: O(n) where n is source code length
  • Parser: O(n) for most constructs, recursive descent
  • Evaluator: Tree-walking interpreter, no optimization
  • Environment: Hash map lookups, O(1) average case
  • Closures: Environments are referenced, not copied

Future Enhancements

Potential features to add:

  • For loops and iterators
  • Break/continue statements
  • Object-oriented features (structs/classes)
  • Module system and imports
  • Standard library (file I/O, networking, etc.)
  • Type annotations and checking
  • Pattern matching
  • REPL (Read-Eval-Print Loop)
  • Bytecode compilation and VM
  • JIT compilation
  • Debugger support

License

This project is licensed under the MIT License.

Motivation

This project demonstrates modern interpreter implementation techniques including:

  • Lexical analysis with proper string handling
  • Pratt parsing for operator precedence
  • Dynamic typing with heterogeneous collections
  • Lexical scoping and closure implementation
  • First-class functions
  • Built-in function integration
  • Tree-walking interpretation

The implementation serves as a foundation for understanding programming language design and can be extended with additional features like type systems, optimization passes, or compilation to bytecode/native code.

About

🤓 Simple Compiler made with GO

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages