This is project 7 in the Nand to Tetris course, an assembler from the Hack assembly language into the Hack machine language. Try for yourself here.
This assembler works in three main stages.
-
First, it splits the file line-by-line and removes comments (both line and inline), returning an array.
-
Next, it looks through and finds symbols, adding them to a symbol table, and then substitutes the symbols with their values in the code, returning an array.
-
Finally, the cleaned-up code is passed to the assembler, which assembles each line according to the Hack specification.
These stages are implemented in parser.js, handleSymbols.js, and assemble.js respectively.
It is worth noting that each of these phases is done functionally. Assembly as a process is conceptually functional, so it makes sense to approach a program that accomplishes it the same way. The only functions that are not functional are enterLabelsIntoSymbolTable
and enterVarsIntoSymbolTable
in handleSymbols.js
, which have no return value and merely mutate the state of the symbol table in order to include the user-declared symbols.
Especially given that the task at hand is fundamentally procedural, any attempt at approaching it in an object-oriented way would inevitably lead to the creation of utility classes, which, as has been mathematically proven and divinely confirmed, are abominations.
The parser is by far the most simple part of the program. The main function, parse
, simply splits the input into lines, passes it to stripWhitespace
, and returns a filtered array without empty lines. stripWhitespace
does exactly as you might expect, removing spaces and comments.
The function that handles symbols is, as you might expect, handleSymbols
. It starts with a symbol table, implemented as an object, initialized with the pre-defined symbols in the Hack language. Then, two functions are called, enterLabelsIntoSymbolTableand
enterVarsIntoSymbolTable`, which add to the symbol table all labels and variables created by the user.
Once the symbol table has recorded all the symbols in the code, we can safely remove the labels from the code. After that, replaceVars
looks through each line of code and replaces instances of symbols with the values they represent.
This accomplished the goal of handling the symbols, so we then return the result as an array.
assemble
is the function that coordinates the assembly of the code. It is implemented as follows:
function assemble(code) {
return code.map(line => line[0] === '@' ?
assembleAinstruction(line):
assembleCinstruction(line));
}
The condition tests if the line is an A instruction, and if so we assemble the A instruction, otherwise we know the line is a C instruction, so we perform that assembly.
assembleAinstruction
is a relatively straightforward function, as A instructions are relatively straightforward. It takes the number present in the instruction, converts it to binary, and adds a sufficient number of leading zeroes to make it a 16-bit instruction. This is what assembleAinstruction
returns.
assembleCinstruction
starts by parsing the C instruction, splitting it into an array [destination, computation, jump], the three fields of any C instruction. Each of these are then translated into their binary equivalent then combined in the proper order with a leading 111
, as detailed in the specification.
An example program might be the following, which draws a rectangle on the screen:
@0
D=M
@INFINITE_LOOP
D;JLE
@counter
M=D
@SCREEN
D=A
@address
M=D
(LOOP) // main loop
@address
A=M
M=-1
@address
D=M
@32
D=D+A
@address
M=D
@counter
MD=M-1
@LOOP
D;JGT
(INFINITE_LOOP) // keeps program from
@INFINITE_LOOP // accessing other code
0;JMP
The parser produces the following from this input:
@0
D=M
@INFINITE_LOOP
D;JLE
@counter
M=D
@SCREEN
D=A
@address
M=D
(LOOP)
@address
A=M
M=-1
@address
D=M
@32
D=D+A
@address
M=D
@counter
MD=M-1
@LOOP
D;JGT
(INFINITE_LOOP)
@INFINITE_LOOP
0;JMP
After this, handleSymbols
produces this:
@0
D=M
@23
D;JLE
@16
M=D
@16384
D=A
@17
M=D
@17
A=M
M=-1
@17
D=M
@32
D=D+A
@17
M=D
@16
MD=M-1
@10
D;JGT
@23
0;JMP
From which the assembler assembles
0000000000000000
1111110000010000
0000000000010111
1110001100000110
0000000000010000
1110001100001000
0100000000000000
1110110000010000
0000000000010001
1110001100001000
0000000000010001
1111110000100000
1110111010001000
0000000000010001
1111110000010000
0000000000100000
1110000010010000
0000000000010001
1110001100001000
0000000000010000
1111110010011000
0000000000001010
1110001100000001
0000000000010111
1110101010000111