Skip to content

namantam1/x86-assembly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

45 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

License GitHub last commit (by committer)

Table of Contents

Introduction

This repository contains the assembly programs I implemented while studying assembly programming from many different sources, including the book "Programming from the Ground Up" by Jonathan Bartlell.

This book is highly recommended if you want to understand how a computer runs programs, how memory is allocated, and how data moves back and forth between RAM and the CPU (registers) to do calculations and save results. Additionally, you'll gain a thorough grasp of how high-level programs like C/C++ compile down to machine code which computers can understand and execute.

These assembly programs in this repository are 32-bit and 64-bit programs for an x86 processor and Linux operating system with AT&T syntax which can be compiled using GNU/GCC compiler.

๐Ÿ‘‰ Importantโ— As I am new to assembly programming, the information provided in this repository might not be entirely accurate or error-free, despite my best efforts to prevent them. The knowledge contained in this repository is the result of information I've learned from a variety of sources.

Your comments are very appreciated if you discover any incorrect information. For it, you may create a PR. For further information, see the contribution guidelines for your contributions.

๐Ÿ‘‰ The reader is assumed to have a fundamental understanding of C/C++ in order to follow this assembly programming lesson. You can relate to the idea discussed in this guide more if you are proficient in C/C++. To comprehend the notion in the upcoming context, we will use the C programming language as base language.

๐Ÿ‘‰ The programs in this repository can be executed using GCC compiler on Linux and Windows(by installing WSL).

โœจSome Basicsโœจ (Very Very Important)

What does a basic program look like in C?

  1. variable definition (char, int, long, array).
  2. some if/else conditions.
  3. some loops.
  4. calling functions.

That's it, almost every programming language has at least these four things that we can do, but the question is how the computer interprets and run it.

Every program requires CPU and RAM to run, I'm not saying only these two are required, but for understanding the basics we need to focus on these two only.

Every CPU has some general purpose registers and some special registers, We can think of these registers as memory locations in the CPU. For example, the x86 32-bit processor has the general-purpose register such as %eax, %ebx, %ecx, %edx, %edi and %esi

In addition to these, there are also some special-purpose registers, including:

  • %ebp: Base pointer register
  • %esp: Stack pointer register
  • %eip: Instruction pointer register
  • %eflags: status register

Depending on the CPU architecture, each register can hold data of size either 32-bit for a 32-bit processor or 64-bit for a 64-bit processor. Since the number of variables used can be more than the number of registers, We need to store data somewhere else.

RAM enters the picture at this point. Data can be kept in RAM and pointed to a register using its RAM address. As a result, during program execution, data may transfer from RAM to CPU and, following processing, may return to RAM.

โœจNote: In the incoming sections, you will learn assembly for 32-bit processor. Once you are comfortable with 32-bit assembly programming you can move to 64-bit assembly pragramming here.

Operating system features are accessed through system calls. These system calls are invoked by setting up registers in a special way and issuing the instruction int $0x80 where int stands for interrupt. Kernal knows which system call you want to access by what you store in %eax register. Each system call has other requirements as to what needs to be stored in the different registers.

Conditional statements, loops and functions are discussed in detail in upcoming topics.

Compiling and Execution

The code can be compiled, linked, and executed as follows:

# compile
as -32 exit.s -o exit.out

# link
ld -m elf_i386 -s exit.out -o exit

# execute
./exit

# check if the exit code is correct
echo $? # prints exit code which the above program emits

Chapters ๐Ÿ“–

These chapters walk you through step-by-step assembly programming using concepts from other programming languages that you are likely already familiar with such as conditions, loops, functions, printing to console and many others.

You will learn it in these chapters along with the example of how you write it in C language.

First assembly program

Your first program would have been a Hello World program for the majority of programming languages you may have learned up to this point, but developing a Hello World program in assembly is a bit more difficult and is covered in later chapters.

So what will be the first assembly program we write?

We will write a very simple program that exits with a certain exit code. You can write a C program for it as:

int main() {
 return 0;
}

To accomplish the same in assembly

.section .text                     
.globl _start
           
_start:
 movl $1, %eax # sys call for exit
 movl $10, %ebx # exit status code
 int $0x80 # wake kernal to exit 
  • The First line in the above code tells the compiler that the logic of the program starts from there which is .text section. There can be other sections in the program such as .data section and .bss section which is discussed later.

  • .globl _start tells from which block of code execution of the program should start.

  • _start: tells _start block starts here.

  • The movl is an assembly instruction with two operands that says the data to move from first to second. Check more instructions in the assembly instructions section.

  • $ sign before the first operand indicated we want to use immediate addressing mode which embeds the data into the instruction itself. Please check [data accessing methods for a details discussion on it.

  • 1 is set into %eax register for the exit syscall and %ebx is set with an exit code value that is 10.

  • int $0x80 is used to interrupt the kernel and issue the system call.

Every instruction's l suffix informs the CPU that the register width to utilize is 32 bits. Although not always necessary, it becomes crucial when programming assembly in 64-bit.

To verify the exit code of the above program compile and run, then

echo $? # will output 10

"Hello World!" Program ๐Ÿ‘‹

Exit status codes were previously used as a program's output, however, they shouldn't be used for this. You must be aware by this point that a system call, such as the one you just made by setting the %eax register to 1, is required to execute any type of I/O. In the same way, you would need to make a system call to print something to the console.

To accomplish this, set the %eax register to 4 to initiate a write system call. You need to set the %ebx register to the file descriptor value, which is 1 for stdout, the %ecx register to the location of the buffer, and the %edx register to the size of the message to print.

For a thorough description of opening, closing, reading, and writing files, refer to the files section.

Here is an example of an assembly program

.globl _start


.section .data
 msg:
  .ascii "Hello World\n"

.section .text
_start:
 movl $4, %eax          # sys call for write
 movl $1, %ebx          # set fd which is 1 for stdout
 movl $msg, %ecx        # set buffer address
 movl $12, %edx         # set msg size
 int $0x80              # interrupt kernel to make sys call

 # exit program with successfull status code
 movl $1, %eax
 movl $0, %ebx
 int $0x80

Conditional Statement (if/else)

Programming relies heavily on condition statements, such as the if, if-else, and else statements. To do this in assembly, use the jump instruction to jump to another section of the program. With the use of jump, you can make several branches that help in managing the program's workflow.

We must first compare some values using the cmpl instruction, which saves the outcome in a specially designated register called the %eflags register, before we may conditionally jump. Below is the list of instructions to jump based on the result of compare.

  • je: Jump if values were equal.
  • jg: Jump if the second value is greater than the first.
  • jge: Jump if the second value is greater than or equal to the first.
  • jl: Jump if the second value is less than the first.
  • jle: Jump if the second value less than or equal to the first.
  • jmp: Jump no matter what. It does not need to be proceeded by a comparison.

For example:

int main() {
 int x = 10;
 if (x >= 9)
  return x;
 return 0;
}

We can write the corresponding assembly code as:

.globl _start
.section .text

_start:
 movl $1, %eax   # exit sys call
 movl $10, %ebx  # put value in %ebx, which is exit status of register also
 movl $9, %ecx   # put value to compare in condition in %ecx

 cmpl %ecx, %ebx # compare 10 with 9 if its greater than equal to
 jge end_block   # if condition meet jump to the end
 movl $0, %ebx   # otherwise set 0 in exit status register
end_block:
 int $0x80       # interrupt kernel

Loops (for/while/do while)

Similar to if/else conditions you can use jump instruction to create a loop in the assembly.

For example:

int main() {
  int sum = 0;
  int i = 10;
  while (i > 0) {
    sum += i;
    i--;
  }

  return sum;
}

The corresponding assembly program can be written as:

.globl _start  
.section .text

_start:
 movl $1, %eax   # set sys call
 movl $0, %ebx   # initialze %ebx (status code register) to 0
 movl $10, %ecx  # store 10 in %ecx

loop:
 addl %ecx, %ebx # add current value to %ebx
 decl %ecx       # decrease %ecx by 1

end:
 cmpl $0, %ecx   # check if greater than 0
 jg loop         # if so jump to loop until the condition is made

 int $0x80       # interrupt kernel

Functions (Procedures)

Functions are a crucial component of programming that helps in the development of code that is reusable, modular, and maintained. You must have been taught that when we call a function stack is used internally to keep track of data used from where it is called and of functions while studying functions in other programming languages. You will see how this is done in assembly.

We only have a limited amount of registers, thus when calling a function, you must keep local variables (in registers) where they won't be lost because the function may mutate and utilize the same register. Additionally, you need to have a structured manner to store data so that you can simply restore it after running the function. %esp, a unique register that is referred to as a stack register, is used to help with this.

A program should use the pushl instruction to push all of the function's parameters onto the stack before the function is executed, in the opposite order that they are listed in the documentation. Then issue a call instruction specifying which function name to call. It initially pushes the return address, which is the address of the following instruction, into the stack. After that, it changes the instruction pointer %eip to refer to the function's start.

Note: Computer stack expands downward until it reaches the text or data portions of programs, at which point it crash programs due to a stack overflow error.

In a function, you can access all of the data by using a base pointer using a different offset from %ebp. %ebp was made specifically for this purpose which is why it is called the base pointer.

โœจImportantโœจ: Following are the step to set up function parameters, define a function, call the function, and give control back to the point where it is called.

  1. push function parameters in reverse order using pushl instruction.
  2. call the function by issuing a call instruction with the function name.
  3. define function anywhere in the program file as .type <function_name>, @function.
  4. start the function block with the function name.
  5. Now the first two instructions should be to store the old base pointer in the stack and make the stack current position your base point.
    pushl %ebp            # save old base pointer
    movl  %esp, %ebp      # make stack pointer the base pointer
  6. Get the function parameter by using the %ebp register in base pointer addressing mode.
  7. Do calculations.
  8. store the return value in %eax.
  9. reset the stack to what it was when it was called by using the instructions movl %ebp %esp and popl %ebp.
  10. return control back to wherever it was called from by using ret instruction.

For example:

int sum(int a, int b) {
  return a + b;
}

int main() {
  int s = sum(10, 20);
  return s;
}

The corresponding assembly program will look like this:

pushl $10        # push second arg
pushl $20        # push first arg
call  fun_name   # call function, which stores the result in %eax register
addl  $8, %esp   # reset stack (function cleanup)

# get the result as exit code
movl  %eax, %ebx
movl  $1, %eax
int   $0x80


# tell compiled that fun_name is a function
.type fun_name, @function

# function definition
.fun_name:
pushl %ebp            # save old base pointer
movl  %esp, %ebp      # make stack pointer the base pointer

movl  8(%ebp), %ebx   # get first arg
movl  12(%ebp), %ecx  # get second arg
movl  %ebx, %eax      # copy first val to res
movl  %ecx, %eax      # add second val to res

# restore stack before returning
movl  %ebp, %esp      # restore stack pointer
popl  %ebp            # restore base pointer
ret

Using C library functions

The "Hello World" program mentioned above uses the system call write to print a message to the console, although this is not advised because it is difficult to fill up each block of buffer with an ASCII value if you wish to display a dynamic message.

Therefore, we can utilize the standard printf function with signature int printf(const char *restrict format, ...); provided by libc rather than making a direct sys call.

For example:

#include <stdio.h>

int main() {
  printf("Value of x is %d\n", 10);
  return 0;
}

We can write the corresponding assembly code as:

.globl _start

.section .data

msg:
.ascii "Value of x is %d\n\0"

.section .text
_start:
pushl $10          # set second param as int value
pushl $msg         # set first param as formatted string
call printf        # call printf function

# exit program with success exit code
movl $1, %eax
movl $0, %ebx
int $0x80

To compile the above code you need to tell the linker to link libc so that you can utilize the printf function provided by it. We can compile, link and execute the above code as:

as -32 main.asm -o main.out
ld -dynamic-linker /lib/ld-linux.so.2 main.out -m elf_i386 -s -o run -lc

./run
# output: Value of x is 10

You can see that the format string is terminated with a NULL since printf requires a string buffer and assumes that the buffer's endpoint is terminated with a NULL.

๐Ÿ‘‰ Similary, You can use other functions provided in libc, and to get their signature you can check their man page.

Writing Inline assembly in C

There are a number of ways to write inline assembly in the C programming language, but the one given here is sufficient for the needs of the moment.

You can write assembly code to implement a function. However, you must inform the compiler that the code contained in the function body is raw assembly code. For that reason, we add __attribute__((naked)) before a function definition. Get more info about it here. After that we use __asm__ to encapsulate our assembly code.

For example:

// add.c
#include <stdio.h>

__attribute__((naked)) 
int sum(int a, int b) {
    __asm__(
        "pushl %ebp;"
        "movl %esp, %ebp;"
        "movl 8(%ebp), %eax;"  // s = a
        "addl 12(%ebp), %eax;"  // s += b
        "movl %ebp, %esp;"
        "popl %ebp;"
        "ret;"
    );
}

int main() {
    printf("%d\n", sum(10, 20));
    return 0;
}

To compile and run this code

# using -m32 to compile in 32-bit mode
gcc -m32 add.c -o add.out
./add.out
# output: 30

Important topics

Some frequently used assembly Instructions

  • movl : It has two operands, source and destination i.e. movl $src_reg, %dest_reg.

  • addl : Add the source operand to the destination.

  • subl : Subract the source operand from the destination.

  • imull : Multiply the source operant by the destination.

  • incl : Increase the value by 1, like i++

  • decl : Decrease the value by 1, like i--

  • idivl : Requires that dividend in %eax and %edx be zero, the quotient is then transferred to %eax and the remainder to %edx. However, the divisor can be any register or memory location.

l suffix after every intruction tell cpu that with of register to use is 32-bit.

Data Accessing Methods

The general form of memory address reference is:

address_Or_offset(base_Or_offset,index,multiplier)

Above all the fields are options, To calculate the address use the formula:

final_address = address_Or_offset + base_Or_offset + multiplier * index

multiplier and address_Or_offset both must be constant, while the other two must be registers. If any of the pieces are left out, it is just substituted with zero.

You can access data in different ways.

  1. Immediate mode: This is the simplest mode in which data to access is embedded in the instruction itself. Example:

    movl $0, %eax

    This load registers %eax with a value of 0. $ indicates you want to use immediate mode addressing.

  2. Register addressing mode: In this instruction contains a register to access, rather than a memory location. Example:

    movl %eax, %ebx

    Copy value stored in register %eax to register %ebx.

  3. Direct addressing mode: In this, the instruction contains the memory address to access. For example, you can say, Please load this register with data at the address at 200. Example:

    movl ADDR, %eax

    The above program loads the register %eax value at the memory address ADDR.

  4. Index addressing mode: In this instruction contains a memory address along with an index register that specifies the offset to that address.

    .section .data
      .int 1,2,3,4
    
    .
    .
    movl data_start(,%ecx, 2), %eax

    Multiplier is set as 2 here, as the size of int is 2 bytes. %ecx contains the index of data to access.

  5. Indirect addressing mode: In this instruction contains a register that contains a pointer to where the data should be accessed. If %eax held an address, you can move the value at that address to %ebx as

    movl (%eax), %ebx
  6. Base pointer addressing mode: Similar to indirect addressing mode, it includes a number called the offset to add to the register's value before using it for lookup. For example, if you have a record where the value is 4 bytes into the record, and you have the address of the record in %eax, you can retrieve the value into %ebx as

    movl 4(%eax), %ebx

File handling

Opening a file with mode and permission

  • %eax will hold 5 for the sys call
  • address of the first character if the filename should be stored in %ebx.
  • Read/Write indentions represented as a number should be stored in %ecx. You can use 0 for files you want to read from and 03101 for files you want to write to.
  • Files permission should be stored as a number in %edx. You can in general use 0666 for permissions.
movl $5,    %eax
movl $0,    %ebx
movl $0666, %ecx
int $0x80

The above instruction will return a file description in %eax. This number you can use to refer to this file throughout your program.

Read/Write from/to a file using the file descriptor

  • read and write is a system call with values 3 and 4 respectively.
  • fd should be stored in %ebx.
  • The address of a buffer for the data that is to be read is stored in %ecx.
  • The size of the buffer should be stored in %edx. .bss.read will return either number of bytes read or the error code. In the case of write, %eax will contain the number of bytes written or an error code.

Closing files

The close system call is 6. The only parameter to close is the fd placed in %ebx.

.bss section of the program is like the data section, except that it doesn't take space in the executable. This section can reserve storage, but you can't initialize it. In the .data section, you can't set an initial value.

For Example :

.section .bss
.lcomm my_buffer, 500 # It will reserve 500 bytes of storage which you
                      # can use as a buffer

FD for standard and specific files

  • STDIN: 0, it is a read-only file.
  • STDOUT: 1, it is a write-only file.
  • STDERR: 2, it is a write-only file.

Assembly program for x86_64 processor (64-bit)

Since writing assembly in 32-bit mode is simple, up until now, our main focus has been on learning assembly programs and developing some fundamental programming skills.

When developing assembly programs, there are no significant differences between 32-bit and 64-bit modes. Check out the 'x86_64' directory and accompanying instructions to learn assembly in '64-bit' mode.

Examples

x86 (32-bit)

sno program topic
1 exit.asm exit sys call
2 hello_world.asm write sys call
3 hello_world_lib.asm libc function call
4 greatest.asm array(buffer), loops, condition
5 add.asm function(procedure), call stack
6 factorial.asm function, recursion, condition, loops
7 power_iter.asm function, loops, condition
8 power_rec.asm function, recursion, loops, condition

x86_64 (64-bit)

sno program topic
1 sum.asm buffer, loops, condition, libc function
2 add.asm inline asm
3 add_arr.asm buffer, loops, condition, inline asm
4 2d_arr.asm nested buffer, loops, condition, inline asm
5 malloc.asm malloc, loops, condition, inline asm
6 malloc_2d.asm malloc for 2d buffer, loops, condition, inline asm

Leetcode

Once you have mastered x86_64 assembly, you can write inline assembly in C to solve problems on leetcode. The leetcode directory contains some solutions along with their C annotation.

Below is the list of programs:

sno question level topic assembly topics Open
1. Candy hard array, greedy malloc, free, loop, conditions open
2. Climbing Stairs easy Math, Dynamic Programming, Memoization loop, conditions open
3. find the duplicate number medium array,two pointer, binary search loop, conditions open
4. Fibonacci numbers easy array,two pointer, binary search loop, conditions open
5. Find the difference easy hash, string byte(char), loops open
6. sqrt(x) easy math, binary search long int <-> int, loops open
7. Min cost climbing stairs easy array, dp loops, conditions,malloc,free open
8. minimum operations to reduce an integer to 0 medium dp, bit manipulation recurrsion, conditions open
9. missing number easy hash, math, array loop, conditions open
10. plus one easy math, array malloc, free, loop, conditions open
11. single number easy array, bit manipulation bit manipulation, loop open
11. two sum easy array, hash table condition, nested loop open
12. asteroid collision medium array, stack condition, nested loop, malloc open
12. search in a binary search tree medium binary tree condition, recurrsion, struct open

Tools

This is a very amazing tool which compiles code in C/C++ to assembly code for various versions/type of compilers. Using this tool you can see how you C/C++ program would look like in assembly. It can also generate various flavours of assembly code such as we studied AT&T syntax or Intel syntax of assembly.

The best combination of option for our code is like:

  • 32-bit:

    • set compiler to x86-64 gcc 4.1.2
    • set compiler options to -m32 -O2 where O2 is optimization level.
    • Disable intel asm syntax from output dropdown.
  • 64-bit:

    • set compiler to x86-64 gcc 4.1.2
    • set compiler options to -m64 -O2 where O2 is optimization level.
    • Disable intel asm syntax from output dropdown.

You can even compare side by side various assembly code generated for various flavours.

This tool simulates how a C++ program might look if it were written in the C programming language.

Why learn assembly language?

๐Ÿ‘‰ By learning assembly language you will have a better understanding of operating and you would know how your code is compiled and run behind the scene.

๐Ÿ‘‰ You can wisely decide to prevent yourself from writing unoptimized code as well as from premature optimization.

๐Ÿ‘‰ By analyzing assembly code, you can identify vulnerabilities, and develop exploits in software.

๐Ÿ‘‰ You can reverse engineer software, believe me, once you can understand assembly, reverse engineering becomes easy.

References