In short, a binary is what happens when you take high level code such as C or C++, and compile it into something the computer can actually run. I believe in hands on learning, so we can take a look inside one to really find out.
Consider the file hello_world.c:
# include<stdio.h>
int main() {
printf("Hello World!\n");
}
This is your average C file, more or less. It's got a main, some includes, and a little bit of code to be run. However, your computer can't actually run it. In order to make it usable, we can run:
$ gcc -m32 hello_world.c -o hello_world.bin
You can ignore the -m32
argument (you'll learn about it later), but the
-o hello_world.bin
simply specifies what the name of the output file is.
From here, we can execute it:
$ ./hello_world.bin
Hello World!
Unsurprisingly, we get "Hello World!" as the output. But let's go a bit deeper. We can open gdb (GNU Debugger) and see what's happening under the hood:
$ gdb -q ./hello_world.bin
Reading symbols from ./hello_world.bin...(no debugging symbols found)...done.
gdb-peda$ disas main
Dump of assembler code for function main:
0x0804841d <+0>: push %ebp
0x0804841e <+1>: mov %esp,%ebp
0x08048420 <+3>: and $0xfffffff0,%esp
0x08048423 <+6>: sub $0x10,%esp
0x08048426 <+9>: movl $0x80484d0,(%esp)
0x0804842d <+16>: call 0x80482f0 <puts@plt>
0x08048432 <+21>: leave
0x08048433 <+22>: ret
End of assembler dump.
gdb-peda$ quit
Firstly, your prompt probably looks like (gdb)
, whereas mine is gdb-peda$
.
Don't worry about this, my gdb is modified.
The weird stuff that gdb showed us is called assembly language. It's essentially the lowest level human readable code out there. Each line of that code maps one to one with a machine instruction. Let me break this down for you.
0x0804841d <+0>: push %ebp
0x0804841e <+1>: mov %esp,%ebp
0x08048420 <+3>: and $0xfffffff0,%esp
0x08048423 <+6>: sub $0x10,%esp
First, the numbers you see on the left are addresses. Just like your house
address, 0x0804841d
is where the instruction push %ebp
lives. These
first four instructions are just conventions for a function, in this case
main()
.
0x08048426 <+9>: movl $0x80484d0,(%esp)
0x0804842d <+16>: call 0x80482f0 <puts@plt>
These instructions are what actually prints out our "Hello World!". The program
moves the address of the string "Hello World!" into the memory that %esp
points to. %esp
is a register. It holds four bytes of information for quick
access, usually some address. Our program then calls puts()
, which prints out
whatever is at the address we supplied.
0x08048432 <+21>: leave
0x08048433 <+22>: ret
Finally, these last two just pass control from our main()
back to the C
library, which does some cleaning up and then exits. We'll be learning more
about how these binaries function in later tutorials.