-
Notifications
You must be signed in to change notification settings - Fork 0
Assembly_to_byte code
To understand the rules of translation from source code into byte-code, we have to take a look at the structure of the file .cor.
Let's translate one of the champions with the provided asm:
.name "Batman"
.comment "This city needs me"
loop:
sti r1, %:live, %1
live:
live %0
ld %0, r2
zjmp %:loop
This is how the .cor content looks like:
first four bytes of the file are "magic header"
It is defined in the COREWAR_EXEC_MAGIC constant in the file op.h as 0xea83f3.
Magic header is a signature of the file and means file is of a certain type (make analogy to the extension).
If you see a file with .cor extension, you assume it is a corewar champion. The virtual machine will also check the magic header and will only execute the file if it does contain one.
Next 128 bytes are champion's name. 128 is the value of constant PROG_NAME_LENGTH from op.h file.
If the actual name is shorter than 128 bytes. The remaining space will be filled in with trailing zeroes.
Each character is written to the file as it's ASCII value:
| Character | B |
a |
t |
m |
a |
n |
|---|---|---|---|---|---|---|
| ASCII-code | 0x42 |
0x61 |
0x74 |
0x6d |
0x61 |
0x6e |
Next four bytes are NULL-bytes. Their goal is to be present at this place and to be NULL. If they are not, file is invalid.
Next four bytes represent the size of champion's executable code (only executable part of champion).
The virtual machine checks whether executable code size doesn't exceed CHAMP_MAX_SIZE value from op.h file (682 bytes).
Next 2048 bytes represent champions comment. It is same as champion name excepting max size COMMENT_LENGTH.
Next four bytes are NULL-bytes.
The last part of the file is executable code.
No trailing zeroes in this part.
We will need two tables to understand how encoding works.
| Code | Name | Argument #1 | Argument #2 | Argument #3 | Codage octet | Size T_DIR
|
|---|---|---|---|---|---|---|
| 0x01 | live |
T_DIR |
— | — | no | 4 |
| 0x02 | ld |
T_DIR / T_IND
|
T_REG |
— | present | 4 |
| 0x03 | st |
T_REG |
T_REG / T_IND
|
— | present | 4 |
| 0x04 | add |
T_REG |
T_REG |
T_REG |
present | 4 |
| 0x05 | sub |
T_REG |
T_REG |
T_REG |
present | 4 |
| 0x06 | and |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
present | 4 |
| 0x07 | or |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
present | 4 |
| 0x08 | xor |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR / T_IND
|
T_REG |
present | 4 |
| 0x09 | zjmp |
T_DIR |
— | — | no | 2 |
| 0x0a | ldi |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
T_REG |
present | 2 |
| 0x0b | sti |
T_REG |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
present | 2 |
| 0x0c | fork |
T_DIR |
— | — | no | 2 |
| 0x0d | lld |
T_DIR / T_IND
|
T_REG |
— | present | 4 |
| 0x0e | lldi |
T_REG / T_DIR / T_IND
|
T_REG / T_DIR
|
T_REG |
present | 2 |
| 0x0f | lfork |
T_DIR |
— | — | no | 2 |
| 0x10 | aff |
T_REG |
— | — | present | 4 |
What is «Size
T_DIR» stands for?In short, to read/writes correct amount of bytes from/to vm memory.
In more detail it will be covered in corresponding chapter.
The second table contains codes of types of arguments, and size of arguments.
| Type | Assembly | Code | Size |
|---|---|---|---|
T_REG |
r |
01 |
1 byte |
T_DIR |
% |
10 |
T_DIR size |
T_IND |
— | 11 |
2 bytes |
Registries and their sizes
There are two characteristics of a registry
Name of registry (
r1,r2...) has size of 1 byte and is laced in the byte-code. But the registry itself is 4 byte big, as defined in theREG_SIZEconstant, and is a variable of cursor.
T_DIRarguments sizeAs we can see in the operations table, size of arguments of type
T_DIRis not fixed and depends on operation. but fileop.hcontains a constantT_DIRdefined with value — 4:# define IND_SIZE 2 # define REG_SIZE 4 # define DIR_SIZE REG_SIZEWhat is the logic?
Operations with size 2 for
T_DIRargument use this argument as a relative address (as an argument of typeT_IND), andT_INDsize is always 2 bytes.
Each operation in byte-code has the following structure:
- Operation code — 1 byte
- Encoding byte for saving arguments types (if needed) — 1 byte
- Arguments (see sizes in the table)
Encoding byte for arguments types
We can check whether it is needed for current operation from the operations table. If operation has one single argument and it's type is
T_DIR, then encoding byte is not used. For all other operations encoding byte must be present.
Let's encode some operations:
loop:
sti r1, %:live, %1
live:
live %0
ld %0, r2
zjmp %:loop
loop:
sti r1, %:live, %1
| Operation code | Encoding byte | Argument #1 | Argument #2 | Argument #3 |
|---|---|---|---|---|
| 1 byte | 1 byte | 1 byte | 2 bytes | 2 bytes |
Operation code
The code for each operation can be found in the operations table (1-16). For sti it is 11.
Encoding byte for the arguments types
Lets write encoding byte in binary form. Left most pair of bits stand for type of argument #1, next pair - for type of argument #2, third pair - for type of argument #3. Last pair is always 00. Codes for different types can be found in Arguments table.
| Argument #1 | Argument #2 | Argument #3 | — | The whole byte | Decimal | Hex |
|---|---|---|---|---|---|---|
T_REG |
T_DIR |
T_DIR |
— | — | ||
01 |
10 |
10 |
00 |
01101000 |
Argument-registry T_REG
Convert the number of registry r1 into 0x01.
Argument-label T_DIR
Label is converted into a number, which represents distance in bytes from the current position.
Label live points to the next operation, we know that current operation is 7 bytes, so the value to write is 7.
This value must be written on 2 bytes — 0x0007.
Argument-number T_DIR
In this case we just take value and write as is on 2 bytes.
Final output for this operation
| System | Operation code | Encoding byte | Argument #1 | Argument #2 | Argument #3 | Final output |
|---|---|---|---|---|---|---|
Decimal |
11 |
104 |
1 |
7 |
1 |
|
Hex |
0x0B |
0x68 |
0x0001 |
0x0007 |
0x01 |
0B 68 01 0007 0001 |
Repeat for the next operation:
live:
live %0
The only major difference is that this operation doesn't have the encoding byte for argument types:
| Operation code | Argument #1 |
|---|---|
| 1 byte | 4 bytes |
Final output in hex — 01 0000 0000.
ld %0, r2
| Operation code | Encoding byte | Argument #1 | Argument #2 |
|---|---|---|---|
| 1 byte | 1 byte | 4 bytes | 1 byte |
Final output in hex — 02 90 00 00 00 00 02.
zjmp %:loop
Operation code for zjmp is 0x09.
Encoding byte is not used here
| Operation code | Argument #1 |
|---|---|
| 1 byte | 2 bytes |
Final output in hex — 09 ff ed
Executable code of the champion will look like:
0b68 0100 0700 0101 0000 0000 0290 0000
0000 0209 ffed