I’ve been on a quest over the last year or so to understand fully how a program ends up going from your brain into code, from code into an executable and from an executable into an executing program on your processor. I like the point I’ve got to in this pursuit, so I’m going to brain dump here :)
Prerequisite Knowledge: Some knowledge of assembler will help. Some knowledge of processors will also help. I wouldn’t call either of these necessary, though, I’ll try my best to explain what needs explaining. What you will need, though, is a toolchain. If you’re on Ubuntu, hopefully this article will help. If you’re on another system, Google for “[your os] build essentials”, e.g. “arch linux build essentials”.
The Birth of a Program
You have an idea for a program. It’s the best program idea you’ve ever had so you quickly prototype something in C:
1 2 3 4 5 6
A work of genius. You quickly compile and run it to make sure all is good:
$ gcc hello.c -o hello $ ./hello Hello, world!
But wait… What has happened? How has it gone from being quite an understandable high level program into being something that your processor can understand and run. Let’s go through what’s happening step by step.
GCC is doing a tonne of things behind the scenes in the
gcc hello.c -o hello
command. It is compiling your C code into assembly, optimising lots in the
process, then it is creating “object files” out of your assembly (usually in a
format called ELF on Linux platforms), then it is linking those object files
together into an executable file (again, executable ELF format). At this point
we have the
hello executable and it is in a well-known format with lots of
cross-machine considerations baked in.
After we run the executable, the “loader” comes into play. The loader figures out where in memory to put your code, it figures out whether it needs to mess about with any of the pointers in the file, it figures out of the file needs any dynamic libraries linked to it at runtime and all sorts of mental shit like that. Don’t worry if none of this makes sense, we’re going to go into it in good time.
Compiling from C to assembly
This is a difficult bit of the process and it’s why compilers used to cost you an arm and a leg before Stallman came along with the Gnu Compiler Collection (GCC). Commercial compilers do still exist but the free world has standardised on GCC or LLVM, it seems. I won’t go into a discussion as to which is better because I honestly don’t know enough to comment :)
If you want to see the assembly output of the
hello.c program, you can run the
$ gcc -S hello.c
This command will create a file called
hello.s, which contains assembly
code. If you’ve never worked with assembly code before, this step is going to be
a bit of an eye opener. The file generated will be long, difficult to read and
probably different to mine depending on your platform.
Now is not the time or place to teach assembly. If you want to learn, this book is a brilliant place to start. I will, however, point out a little bit of weirdness in the file. Do you see stuff like this?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
I was initially curious as to what this was as well, so I checked out stack overflow and came across a really great explanation of what this bit means, which you can read here.
Also, notice the following:
The assembly program is calling
puts instead of
printf. This is an example
of the kind of optimisation GCC will do for you, even on the default level of
“no optimisation” (
-O0 flag on the command line).
printf is a
function, due to having to deal with a large range of format codes.
far less heavy. I could only find the NetBSD version of it.
puts itself is
very small and it delegates to
__sfvwrite, the code of which is
If you want more information on how GCC will optimise
this is a great
Also, if assembler is a bit new to you, a few things to note is that this post is using GAS (Gnu Assembler) syntax. There are different assemblers out there, a lot of people like the Netwide Assembler (NASM) which has a more human friendly syntax.
GAS suffixes its commands with a letter that describes what “word size” we’re
dealing with. Above, you’ll see we used
q stands for “quad”,
which is a 64bit value. Here are other suffixes you may run in to:
- b = byte (8 bit)
- s = short (16 bit integer) or single (32-bit floating point)
- w = word (16 bit)
- l = long (32 bit integer or 64-bit floating point)
- q = quad (64 bit)
- t = ten bytes (80-bit floating point)
Assembling into machine code
By comparison, turning assembly instructions into machine code is pretty simple. Compiling is a much more difficult step than assembling is. Assembly instructions are often a 1 to 1 mapping into machine code.
At the end of the assembling stage, you would expect to have a file that just contained binary instructions right? Sadly that’s not quite the case. The processor needs to know a lot more about your code than just the instructions. To facilitate passing this required meta-information there are a variety of binary file formats. A very common one in *nix systems is ELF: executable linkable format.
Your program will be broken up into lots of sections. For example, a section
.text contains your program code. A section called
statically initialised variables (globals, essentially), that are not given a
starting value, thus get zeroed. A section called
.strtab contains a list of
all of the strings you plan on using in your program. If you statically
initialise a string anywhere, it’ll go into the
.strtab section. In our
hello.c example, the string
"Hello, world!\n" will go into the
This article, from issue 13 of Linux Journal in 1995, gives a really good overview of the ELF format from one of the people who created it. It’s quite in depth and I didn’t understand everything he said (still not sure on relocations), but it’s very interesting to see the motivations behind the format.
Linking into an executable
Coming back from the previous tangent, let’s think about linking. When you
compile multiple files, the
.c files get compiled into
.o files. When I
first started doing C code, one thing that continuously baffled me was how a
.c file referenced a function in another
.c file. You only reference
files in a
.c file, so how did it know what code to run?
The way it works is by creating a symbol table. There are a multitude of types
of symbols in an executable file, but the general gist is that a symbol is a
named reference to something. The
nm utility allows you to inspect an
executable file’s symbol table. Here’s some example output:
$ nm hello 0000000100001048 B _NXArgc 0000000100001050 B _NXArgv 0000000100001060 B ___progname 0000000100000000 A __mh_execute_header 0000000100001058 B _environ U _exit 0000000100000ef0 T _main U _puts 0000000100001000 d _pvars U dyld_stub_binder 0000000100000eb0 T start
Look at the symbols labelled with the letter
U. We have
_exit symbol is operating system specific and will be
the routine that knows how to return control back to the OS once your program
has finished, the
_puts symbol is very important for our program and exists in
whatever libc we have, and
dyld_stub_binder is an entry point for resolving
dynamic loads. All of these symbols are “unresolved”, which means if you try and
run the program and no suitable match is found for them, your program will fail.
So when you create an object file, the reason you include the header is because everything in that header file will become an unresolved symbol. The process of linking multiple object files together will do the job of finding the appropriate function that matches your symbol and link them together for the final executable created.
To demonstrate this, consider the following C file:
1 2 3 4 5 6 7 8
Compiling this file into an object file and then inspecting the contents will show you the following:
$ gcc -c hello.c $ nm hello.o 0000000000000050 r EH_frame0 000000000000003b r L_.str 0000000000000000 T _main 0000000000000068 R _main.eh U _puts U _test
We now have an unresolved symbol called
_test! The linker will expect to find
that somewhere else and, if it does not, will throw a bit of a hissy fit. Trying
to link this file on its own complains about 2 unresolved symbols,
_puts. Linking it against libc complains about one unresolved symbol,
Unfortunately, because we don’t actually have a definition for
test() we can’t
use it. This may sound confusing, seeing as we defer the linking of
until runtime. Why can’t we just do the same with
test()? Build an executable
file and let the loader/linker try and figure it out at runtime?
In the linking process you need to specify where the linker will be able to
find things on the target system. Let’s step through the original
example, doing each of the compilation steps ourself:
$ gcc -c hello.c
hello.o with an unresolved
$ ld hello.o
This craps out. We need to give it more information. At this point I’m going to
mention that I’m on a Mac system and am about to reference libraries that have
different names on a Linux system. As a general rule here, you can replace the
.dylib extension with
$ ld hello.o /usr/lib/libc.dylib
This still craps out. Check out this error message:
ld: entry point (start) undefined. Usually in crt1.o for inferred architecture x86_64
What the hell? This is a really good error to come across and learn about, though. It leads us nicely into the next section.
Running the program
Wait, didn’t we finish the last section with an object file that wouldn’t link for some arcane reason? Yes, we did. But getting to a point where we can successfully link it requires us to know a little bit more about how our program starts running when it’s loaded into memory.
Before every program starts, the operating system needs to set things up for it.
Things such as a stack, a heap, a set of page tables for accessing virtual
memory and so on. We need to “bootstrap” our process and set up a good
environment for it to run in. This setup is usually done in a file called
When you started learning programming and you used a language that got compiled,
one of the first things you learned was that your program’s entry point is
main() right? The true story is that your program doesn’t start in main, it
start. This detail is abstracted away from you by the OS and the
toolchain, though, in the form of the
The osdev wiki shows a
great example of a simple
crt0.o file that I’ll copy here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
07/08/2013 UPDATE: In a previous version of this post I got this bit totally wrong, confusing the 32bit x86 calling convention with the x86-64 calling convention. Thanks to Craig in the comments for pointing it out :) The below should now be correct.
The line that’s probably most interesting there is where
main is called. This
is the entry point into your code. Before it happens, there is a lot of setup.
Also notice that
argv handling is done in this file, but it assumes
that the loader has pushed the values into registers beforehand.
Why, you might ask, do
argv live in
%rdi before being
passed to your main function? Why are those registers so special?
The reason is something called a “calling convention”. This convention details how arguments should be passed to a function call before it happens. The calling convention in x86-64 C is a little bit tricky but the explanation (taken from here) is as follows:
Once arguments are classified, the registers get assigned (in left-to-right order) for passing as follows:
1. If the class is MEMORY, pass the argument on the stack.
2. If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used
For example, take this C code:
1 2 3 4 5 6 7 8 9
The assembler that would call that function goes something like this:
1 2 3
$1 there are the literal, decimal values being passed to the
function. Easy peasy :) The convention isn’t something that needs to be
followed in your own assembly code. You’re free to put arguments wherever you
want, but if you want to interact with existing library functions then you need
to do as the Romans do.
With all of this said and done, how do we correctly link and run our
file? Like so:
$ ld hello.o /usr/lib/libc.dylib /usr/lib/crt1.o -o hello $ ./hello Hello, world!
Hey! I thought you said it was
crt0.o? It can be…
crt1.o is a file with
exactly the same purpose but it has more in it.
crt0.o didn’t exist on my
crt1.o did. I guess it’s an OS decision.
a short mailing list post that talks about it.
Interestingly, inspecting the symbol table of the executable we just linked together shows this:
$ nm hello 0000000000002058 B _NXArgc 0000000000002060 B _NXArgv U ___keymgr_dwarf2_register_sections 0000000000002070 B ___progname U __cthread_init_routine 0000000000001eb0 T __dyld_func_lookup 0000000000001000 A __mh_execute_header 0000000000001d9a T __start U _atexit 0000000000002068 B _environ U _errno U _exit U _mach_init_routine 0000000000001d40 T _main U _puts U dyld_stub_binder 0000000000001e9c T dyld_stub_binding_helper 0000000000001d78 T start
The reason is that
.so files (they have the same job, but on Mac
they have the
.dylib extension and probably a different internal format) are
dynamic or “shared” libraries. They will tell the linker that they are to be
linked dynamically, at runtime, rather than statically, at compile time. The
crt*.o files are normal objects, and link statically which is why the
symbol has an address in the above symbol table.
The Death of a Running Program
You return a number from
main() and then your program is done, right? Not
quite. There is still a lot of work to be done. For starters, your exit code
needs to be propagated up to any parent processes that may be anticipating your
death. The exit code tells them something about how your program finished.
Exactly what it tells them is entirely up to you, but the standard is that 0
means everything was okay, anything non-zero (up to a max of 255) signifies that
an error occurred.
There is also a lot of OS cleanup that happens when your program dies. Things
like tidying up file descriptors and deallocating any heap memory you may have
free() before you returned. You should totally get into the habit
of cleaning up yourself, though!
So that’s about the extent of my knowledge on how your code gets turned into a running program. I know I missed some bits out, oversimplified some things and I was probably wrong in places. If you can correct me on any point, or have anything illuminating about how non-x86 or non-ELF systems do the above tasks, I would love to have a discussion about it in the comments :)