Skip to content

Latest commit

 

History

History
 
 

intro-1

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Intro 1: What is a binary, really?

In short, a binary is what happens when you take high level code such as C or C++, and compile it into something the computer can actually run. I believe in hands on learning, so we can take a look inside one to really find out.

Consider the file hello_world.c:

# include<stdio.h>
int main() {
    printf("Hello World!\n");
}

This is your average C file, more or less. It's got a main, some includes, and a little bit of code to be run. However, your computer can't actually run it. In order to make it usable, we can run:

$ gcc -m32 hello_world.c -o hello_world.bin

You can ignore the -m32 argument (you'll learn about it later), but the -o hello_world.bin simply specifies what the name of the output file is.

From here, we can execute it:

$ ./hello_world.bin
Hello World!

Unsurprisingly, we get "Hello World!" as the output. But let's go a bit deeper. We can open gdb (GNU Debugger) and see what's happening under the hood:

$ gdb -q ./hello_world.bin
Reading symbols from ./hello_world.bin...(no debugging symbols found)...done.
gdb-peda$ disas main
Dump of assembler code for function main:
   0x0804841d <+0>:     push   %ebp
   0x0804841e <+1>:     mov    %esp,%ebp
   0x08048420 <+3>:     and    $0xfffffff0,%esp
   0x08048423 <+6>:     sub    $0x10,%esp
   0x08048426 <+9>:     movl   $0x80484d0,(%esp)
   0x0804842d <+16>:    call   0x80482f0 <puts@plt>
   0x08048432 <+21>:    leave
   0x08048433 <+22>:    ret
End of assembler dump.
gdb-peda$ quit

Firstly, your prompt probably looks like (gdb), whereas mine is gdb-peda$. Don't worry about this, my gdb is modified.

The weird stuff that gdb showed us is called assembly language. It's essentially the lowest level human readable code out there. Each line of that code maps one to one with a machine instruction. Let me break this down for you.

0x0804841d <+0>:     push   %ebp
0x0804841e <+1>:     mov    %esp,%ebp
0x08048420 <+3>:     and    $0xfffffff0,%esp
0x08048423 <+6>:     sub    $0x10,%esp

First, the numbers you see on the left are addresses. Just like your house address, 0x0804841d is where the instruction push %ebp lives. These first four instructions are just conventions for a function, in this case main().

0x08048426 <+9>:     movl   $0x80484d0,(%esp)
0x0804842d <+16>:    call   0x80482f0 <puts@plt>

These instructions are what actually prints out our "Hello World!". The program moves the address of the string "Hello World!" into the memory that %esp points to. %esp is a register. It holds four bytes of information for quick access, usually some address. Our program then calls puts(), which prints out whatever is at the address we supplied.

0x08048432 <+21>:    leave
0x08048433 <+22>:    ret

Finally, these last two just pass control from our main() back to the C library, which does some cleaning up and then exits. We'll be learning more about how these binaries function in later tutorials.