Skip to content
This repository was archived by the owner on Sep 27, 2021. It is now read-only.

CSO lec10 11 notes #21

Merged
merged 2 commits into from
Jun 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions _notes/computer-systems-organization/cso-lecture-10.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
title: CSO Lecture 10 notes
author: Yash Mehan
date: 2021-06-14
code: cs2.201
number: 10
---

## Jump Instruction Encodings

Some of the most commonly used encoding is relative to the program counter (PC Relative). This means that the assembler sees how many bytes away from the program counter, the destination label of `jmp` is, and writes that onto the .o file. This distance, or "offset" can be 1 byte long, (i.e. a `jmp` which spans 127 bytes) or 2 bytes long or 4 bytes long.

A second method is to give an absolute address, using 4 bytes to directly specify the destination.

The assembler and linker select appropriate methods when required.

The following is an assembly file.

```
jle .L2
.L5:
movl %edx, %eax
sarl %eax
subl %eax, %edx
leal (%edx,%edx,2), %edx
testl %edx, %edx
jg .L5
.L2:
movl %edx, %eax
```

This is the disassembly of the object file:

```
8: 7e 0d jle 17 <silly+0x17>
a: 89 d0 mov %edx,%eax
c: d1 f8 sar %eax
e: 29 c2 sub %eax,%edx
10: 8d 14 52 lea (%edx,%edx,2),%edx
13: 85 d2 test %edx,%edx
15: 7f f3 jg a <silly+0xa>
17: 89 d0 mov %edx,%eax
```

When the instruction at address `8` is being executed (this the addressing relative to the first instruction of the program), the program counter has `a` stored in it. `7e` is the opcode for `jle` and `0d` is the offset, i.e. the number of bytes to be jumped. `a` + `0d` = `17` in base 16. So this `jle` will set the program counter, to address `17` which is `mov %edx, %eax` . This is exactly what was intended in the assembly program as well.

Similarly for backward jumps, the offsets will have a value, which when represented in binary will have its MSB as 1, i.e. a negative number.

The following is the disassembled version of the program after linking:

```
804839c: 7e 0d jle 80483ab <silly+0x17>
804839e: 89 d0 mov %edx,%eax
80483a0: d1 f8 sar %eax
80483a2: 29 c2 sub %eax,%edx
80483a4: 8d 14 52 lea (%edx,%edx,2),%edx
80483a7: 85 d2 test %edx,%edx
80483a9: 7f f3 jg 804839e <silly+0xa>
80483ab: 89 d0 mov %edx,%eax
```

After linking, the addresses of the instructions are changed to absolute addresses. When the instruction on `804839c` is being executed, the program counter stores `804839e`, and the offset of `0d` causes the program counter to update to `804839e + 0d` which is `80483ab`, which is again what the original assembly file intended to do.

## Conditional moves

The idea of conditional moves is to avoid writing one separate branch just for the sake of `mov`-ing.

```c
if( condition ){
//then-block
} else{
//else-block
}
```

One can create a label for `then-block`, then `jmp` back to the if condition, then create another label for `else-block` and `jmp` back.

Consider a function to find the absolute difference between two numbers.

An idea to avoid using `jle`, `jg` etc is to pre-calculate both `x-y` and `y-x`, store one of them, say, `x-y` as the return value. Then check if `x<y` then `mov` into `%eax` the `y-x` value.

```
movl 8(%ebp), %ecx # Get x into %ecx
movl 12(%ebp), %edx # Get y into %edx
movl %edx, %ebx # Copy y
subl %ecx, %ebx # Compute y-x
movl %ecx, %eax # Copy x
subl %edx, %eax # Compute x-y and set as return value
cmpl %edx, %ecx # Compare x and y
cmovl %ebx, %eax # If <, replace return value with y-x
```

In the instruction `cmovl`, the `l` stands for less than, not `long`. Similar to the `jmp` family, which had `jl`, `jg`, `je`, `jne`, `jge`, etc. `cmov` family has the exact same suffixes and similar logic. `cmov` instructions check work upon the values in those very flags as `jmp` instructions do. Observe that we have a redundancy here, we are calculating both `x-y` and `y-x`. Although since subtraction isn't much of an expensive process, it doesn't make much difference, but had there been calculation intensive branches, then performing both of them only to discard one at the end would be inefficient.

## Pipelining and `jmp`

For successful execution of each instruction, we can roughly say the following steps need to be completed:

1. Fetch the instruction from memory
2. Decode
3. Load operands
4. Execute the instruction
5. Save output
6. Updating the program counter

A common technique to speed up processing is doing these steps parallelly for each instruction, much like an **assembly line in a factory**. For example, while say, instruction 42 is undergoing stage 6, instruction 43 is undergoing stage 5, instruction 44 is undergoing stage 5 etc., rather than letting instruction 42 to go through stages 1, 2, ... 6; then instruction 43 go through 1, 2... 6.

For the most part, this technique helps in quick execution of instructions because otherwise most of the modules which fetch, decode etc. will be idle. Consider the following scenario:

```
fetch Decode Load OP Exec Save Output Update PC
----- ------ ------- ---- ----------- ---------
47 46 45 44 43 42
```

Suppose instruction 44, which is under execution, asks for `jmp`ing to an instruction, say 100. Then this queue, of 47, 46, 45 has to be cleared, and instruction 100 has to be put in the Fetch stage.

```
fetch Decode Load OP Exec Save Output Update PC
----- ------ ------- ---- ----------- ---------
100 ~ ~ ~ 44 43
101 100 ~ ~ ~ 42
102 101 100 ~ ~ ~
```

Not only the clearing is wasteful, but these `~` these blank spots in the pipeline are causing the processing to slow down, because the modules which undertake these steps are idle again. This is how`jmp` slows down the execution of a program. `cmov` s avoid this issue. There is a tradeoff to make, either to do redundant calculations, some of which would have to be discarded, or have idling spots in the pipeline. This is the compiler's job to evaluate which one is most efficient.

## Error Conditions

Given a pointer, a function which returns the value stored at the location pointed, and 0 if the pointer is `NULL`, using `cmov` isn't safe because when the pointer is indeed `NULL`, the code will segfault at line 1.

```
movl (%edx), %eax
testl %edx, %edx
movl $0, %edx
cmove %edx, %eax
```

Were we using `jmp`, we'd have to first `testl` . If the pointer turned out to be `NULL` we wouldn't have to dereference the `%edx`, and saved ourselves from segfaults.
93 changes: 93 additions & 0 deletions _notes/computer-systems-organization/cso-lecture-11.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
title: CSO Lecture 11 notes
date: 2021-06-18
author: Yash Mehan
code: cs2.201
number: 11
---
## Loops

A plan for a loop. For, while, do-while all are follow similar implementation.

```
init expressions
t = test expression
if(!t)
goto done
loop:
body
update expression
if(t)
goto loop
done:
```

## Switch statements

to check a variable against multiple cases, which are constants themselves. `switch` doesn't check against each statement sequentially. Maintaining an array of labels where to jump and looking up that array to `jmp` to location without actually checking other conditions. This works efficiently if the cases are close, but we need a hash table if cases are far apart. [request clarification]

**Example:**

```c
switch(n){
case 100:
result = 100;
break;
case 102:
result = 102;
case 103:
result = 103;
break;
default:
result = 0;
}
```

To get an idea of how it is implemented in assembly, consider this C snippet:

```c
static void* jumptable[4] = {&&label_A, &&label_default, &&label_B, &&label_C};
//&& is GCC-sepecific sytanx to get the address of labels. unary operator
unsigned index = n - 100;
int result;
if(index > 3){goto label_default;}
goto *(jumptable[index]);

//`*` is necessary here because the array stores the address of labels.
//goto doesn't need the address of the label, but the label itself.
//hence the dereferencing

label_A:
result = 100;
goto done; //notice the `done` label here
label_B:
result = 102; //notice the absence of `goto done` here
// and the fact the subsequent lines will be executed
label_C:
result = 103;
goto done;
label_default:
result = 0;
done:
return result;
```

Assembly snippet on how to set up a jump table, assume `n` is already `mov`'d into `%eax`

```c
subl $100, %eax # n = n-100
compl %6, %eax # compare 6 with n
jge label_default
jmp *jumptable( , %eax, 4)
```

where `jumptable` is an array defined in `.rodata` section

```c
.section .rodata
.align 4
.jumptable
.long label_A
.long label_B
.long label_C
```