A short introduction to the compilation workflow

The runtime environment

From an abstract point of view, a program is nothing but a bunch of program code and program data stored in memory. On decent OSes, the memory as seen by a program is a flat address space. On a typical 32 bit processor, this address space starts at the address zero and ends at the 32bit 4GB boundary. Theorically, the program should be able to access any of the memory locations within this address space. In practice, this is not possible unless in very special configurations.

On a normal x86 Linux system, the program has only access to a fragmented subset of that flat address space which is shown on the right (image courtesy of lwn.net). The text fragment contains the user code, the heap fragment contains the user-allocated memory while the stack fragment contains the user's stack area. The text fragment is usually itself fragmented in multiple fragments each of which represents one of the shared libraries used by the program. Shared libraries are typically used in modern OSes to avoid copying in memory multiple times the code of the libraries used simultaneously by multiple programs. The library code is loaded only once in physical RAM and is mapped sometimes at different addresses in the virtual address spaces of different user processes.

The fact that it is possible for these libraries to be loaded at different locations in different programs means that:

  • The code of the library itself must be position-independent (this means that it must not reference absolute addresses and must almost always use relative addressing. This position-independent code is usually called PIC.
  • The code of the program which accesses the library must be written such that it knows how to access the target library whose address location is not known when the code is generated.

How the library or the program are generated such that they correctly handle the PIC case is beyond the scope of that document. For more information on this topic, I do recommend you read Linkers and Loaders by John R. Levine. An early version is available online but nothing replaces the real paper thing.

When the user requests the OS to run a program, the OS is responsible for establishing the memory map described above. On most Linux systems, it then leaves the problem of handling the program-library communication to a user program named the loader. On most linux systems which use the ELF binary format, this loader is implemented by the glibc package. On my system, this loader is located in /lib/ld-2.3.3.so. The job of this loader is to copy in memory the code of the libraries (well, only if the said libraries have not been loaded already) the program depends upon and to resolve all the references the program makes to these libraries. Of course, this process is recursive since a library can depend on another library and its references to this other library might need to be resolved too.

This whole resolution process is a bit complex on UNIX systems but there are numerous tools you can use to debug it:

  • ldd filename: this program calculates the list of libraries a given binary depends upon. It performs the same process performed by the loader and returns both the list of requested libraries and the list of libraries found. If they were found, it specifies exactly where they were found on the filesystem. For example, on my system, this gives the following output for a random test program named mathieu-test which depends on a library named /tmp/libtestmathieu.so.10.31.
    [mlacage@chronos home]$ ldd mathieu-test
            linux-gate.so.1 =>  (0x00a9c000)
            /tmp/libtestmathieu.so.10.31 => not found
            libc.so.6 => /lib/tls/libc.so.6 (0x008f5000)
            /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x008d8000)
    
  • LD_DEBUG: this environment variable can be set to a number of values to trace the behavior of a program.
  • LD_LIBRARY_PATH: this environment variable is a colon-separated list of path elements where potential libraries can be found. The loader adds these path elements to its list of directories to search for libraries.
  • nm filename: displays the list of symbols a binary file exports and imports. The items whose type is T are defined within the binary file (which means they do not need to be resolved). The items whose type is U are not defined within the binary analysed which means they will need to be resolved.
A lot of other variables are described in the ld.so manpage (since the ld.so program varies slightly from one system to the other, I suggest you run the command man ld.so to get the manual for your local flavor of the command). A lot of things can also be learned from the objdump, readelf and nm manpages.

In most cases, this complicated resolving process can be debugged with little to no knowledge of the exact way things work. The loader first builds a list of directories it needs to look into to find the libraries on which the binary depends. Then, it uses this ordered list of directories to locate the requested libraries. If one of the libraries is not found in this list of paths, loading is aborted and an obscure error is printed on screen.

How the search list is built

  • First, if during the binary link step the -rpath dir option was specified, these directories are added to the search list.
    The list of paths added with the -rpath option can be obtained with the readelf -d filename command which lists under the RPATH row a colon-separated list of paths. For example, you can see what it looks like on my system with the program named mathieu-test:
    [mlacage@chronos home]$ readelf -d mathieu-test
     
    Dynamic segment at offset 0x45c contains 21 entries:
      Tag        Type                         Name/Value
     0x00000001 (NEEDED)                     Shared library: [libc.so.6]
     0x0000000f (RPATH)                      Library rpath: [sources:/usr/lib/X11/twm/]
     0x0000000c (INIT)                       0x8048270
     0x0000000d (FINI)                       0x8048420
     0x00000004 (HASH)                       0x8048148
     0x00000005 (STRTAB)                     0x80481c0
     0x00000006 (SYMTAB)                     0x8048170
     0x0000000a (STRSZ)                      115 (bytes)
     0x0000000b (SYMENT)                     16 (bytes)
     0x00000015 (DEBUG)                      0x0
     0x00000003 (PLTGOT)                     0x8049530
     0x00000002 (PLTRELSZ)                   8 (bytes)
     0x00000014 (PLTREL)                     REL
     0x00000017 (JMPREL)                     0x8048268
     0x00000011 (REL)                        0x8048260
     0x00000012 (RELSZ)                      8 (bytes)
     0x00000013 (RELENT)                     8 (bytes)
     0x6ffffffe (VERNEED)                    0x8048240
     0x6fffffff (VERNEEDNUM)                 1
     0x6ffffff0 (VERSYM)                     0x8048234
     0x00000000 (NULL)                       0x0
    [mlacage@chronos home]$
    
  • Then, if the LD_LIBRARY_PATH environment variable is set, its list of directories is appended to the search list. If the binary is setuid or setgid, this step is skipped.
    Use the echo $LD_LIBRARY_PATH command to know how this environment variable is set. The ls -l filename command can be used to tell if filename is setuid or setgid. The following output shows that /usr/bin/consolehelper is a setuid program (note the ..s pattern in the user permission bitfield. You would see ..s in the group permission bitfield if it was setgid.).
    -rwsr-xr-x  1 root root 5644 Apr  1  2004 /usr/bin/consolehelper
    
  • The list of files stored in /etc/ld.so.cache is appended to the search list. If the binary file was linked with the -z nodefaultlib switch, this step is skipped.
    Since this cache file is generated from the config file /etc/ld.so.conf by the /sbin/ldconfig command, you can view a deciphered version of its content with the /sbin/ldconfig -p command.
  • The list of default directories is appended to the search list. These default directories are usually /lib and then /usr/lib. If the binary file was linked with the -z nodefaultlib switch, this step is skipped.
    It is possible to check for the nodefaultlib flag with the readelf -d filename since this flag is reported in the FLAGS_1 row if it was set. The following output shows an example with the mathieu-test binary:
    [mlacage@chronos home]$ readelf -d mathieu-test
     
    Dynamic segment at offset 0x45c contains 22 entries:
      Tag        Type                         Name/Value
     0x00000001 (NEEDED)                     Shared library: [libc.so.6]
     0x0000000f (RPATH)                      Library rpath: [sources:/usr/lib/X11/twm/]
     0x0000000c (INIT)                       0x8048270
     0x0000000d (FINI)                       0x8048420
     0x00000004 (HASH)                       0x8048148
     0x00000005 (STRTAB)                     0x80481c0
     0x00000006 (SYMTAB)                     0x8048170
     0x0000000a (STRSZ)                      115 (bytes)
     0x0000000b (SYMENT)                     16 (bytes)
     0x00000015 (DEBUG)                      0x0
     0x00000003 (PLTGOT)                     0x8049538
     0x00000002 (PLTRELSZ)                   8 (bytes)
     0x00000014 (PLTREL)                     REL
     0x00000017 (JMPREL)                     0x8048268
     0x00000011 (REL)                        0x8048260
     0x00000012 (RELSZ)                      8 (bytes)
     0x00000013 (RELENT)                     8 (bytes)
     0x6ffffffb (FLAGS_1)                    Flags: NODEFLIB
     0x6ffffffe (VERNEED)                    0x8048240
     0x6fffffff (VERNEEDNUM)                 1
     0x6ffffff0 (VERSYM)                     0x8048234
     0x00000000 (NULL)                       0x0
    [mlacage@chronos home]$
    

References

The C compilation process

The C compilation process is most often implemented by a set of different cooperating programs:

  • pre-processor,
  • compiler,
  • assembler,
  • linker
gcc, the Gnu Compiler Collection contains a C compiler which follows these four stages. The compiler used by the Visual C++ development environment also does this. The following text describes these steps and gives specific examples based on gcc.

The gcc program which you can invoke from the command-line can be instructed to stop the compilation process after any of the stages described above:

  • The -E switch will instruct gcc to stop after the preprocessing stage. Its output will be a text file which corresponds to the preprocessed source file.
  • The -S switch will request gcc to stop after the compilation stage. Its output will be a text file which contains human-readable assembly code for the input source code.
  • The -c switch instructs gcc to stop the compilation process after the assembler stage which means its output will be a binary object file which contains the code generated from the input source as well as a lot of other information.
  • If no such switch is specified, gcc tries to do something sensible which means it will generate either an object file or an executable binary if possible.
At each stage, the -o filename switch can be specified to force gcc to store its output into filename.

The pre-processor

The C pre-processor is mainly responsible for expanding macro definitions and include statements and removing comments. This means that the following program will be transformed into the program shown next:
/* include statement */
#include  foo.h>
/* macro definition */
#define A_Macro(p1,p2) p1 + p2
/* macro use */
int add (int p1, int p2)
{
	return A_Macro (a,b);
}

XXX Content of file foo.h



int add (int p1, int p2)
{
	return p1 + p2;
}

The macro replacement process and the include replace process are, of course, recursively performed until no macro definition, uses, and include statements are present in the source file. To understand the exact behavior of the preprocessor and debug preprocessing bugs (i.e., macro bugs), the developer can request from the C compiler an annotated output of the source file after preprocessing.

With gcc, the -E compilation option can be used to instruct the compiler to stop compilation at this point and output the preprocessed file. The format of the preprocessor output is described in the cpp manual. For example, gcc 3.3.2 uses the format described in this webpage.

The macro definition step can be usually controlled by a few options:

  • It is possible to define a macro on the compiler command line as if it is was defined at the start of the source file: gcc provides the -D command-line argument.
  • It is possible to undefine a macro on the compiler command line as if it was undefined at the start of the source file with the -U switch.

The preprocessing step can be controlled and influenced by many more parameters. By default, if no parameters are specified, the compiler searches a number of default directories for the headers requested by the user. The exact rules used by gcc are described in the cpp manual. gcc allows the user to add directories to the include search path with the -I switch.

The compiler

If you do not specify the -E option to request gcc to stop compilation after the preprocessing stage, the compiler will process with the compilation itself which generates assembly output. To get an idea of what this output looks like, you can use the -S option to look at the assembly output before the assembler runs. The following C code run through the command gcc -S test.c -o test.S generates the following test.S file.

int test (int v)
{
        return 100 * v;
}
        .file   "test.c"
        .text
.globl test
        .type   test, @function
test:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %edx
        movl    %edx, %eax
        sall    $2, %eax
        addl    %edx, %eax
        leal    0(,%eax,4), %edx
        addl    %edx, %eax
        sall    $2, %eax
        leave
        ret
        .size   test, .-test
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)"

warnings

gcc offers many many options to enable and disable large classes of warnings. Certain warnings depend on the code optimization level (because only certain optimization passes enabled at certain optimization levels can perform the code analysis required to generate correctly the warning). The full list of warning options, is, of course, described in the gcc manual. Generally, each warning has two flags: the -Winline and the Wno-inline flavors. The -Wno- flavor disables the warning while the -W flavor enables it.

  • -Wall is the most widely-used option. It enables numerous rather safe warnings (i.e., warnings which do generate as few false positives as possible).
  • -Werror is the second most-widely used warning switch: it makes each warning a compiler error which halts compilation and makes it impossible to process with the compilation of a lot of files without noticing a single warning lost somewhere. This option is great to make sure that a given piece of code stays warning-free.
I personally use the following list of warning options: -Wall -Wfloat-equal -Wundef -Wendif-labels -Wshadow -Wpointer-arith -Wconversion -Wsign-compare -Wredundant-decls For C-only code, I use also -Wdeclaration-after-statement -Wstrict-prototypes -Wold-style-definition Of course, all my code is built with the -Werror option to make sure the code stays warning-clean.

debugging options

To debug the generated code, you need to request gcc to generate debugging information and store it in the resulting binary. Typically, this is done with the -g option. However, gcc supports a number of different debugging information formats and experience has shown that the default debugging information format used by the -g switch is not so great. As such, I strongly advise you to use the dwarf2 debugging information format which is supported by all decent debuggers which means you need to specify the -gdwarf-2 switch rather than the -g switch.

gcc code optimization

The optimization level is the easiest code-generation option available: gcc offers -O0, -O1, -O2 and -O3 which apply a number of speed-related optimization passes to the code. The higher optimization levels are often slower but are supposed to generate faster code. It has often been reported that -O3 generates buggy code so, it should be used with care. If you want to debug your program, I strongly advise against using anything but the -O0 option which ensures that the generated code will behave nicely in a debugger.

The -Os option can be used to target code size optimization. Surprisingly, it also often improves code speed (through better cache locality).

All these optimization options can be used with the -march=, -mtune and -mcpu options which control what type of assembly instructions the compiler will use and what exact CPU the compiler will optimize the instruction scheduling for. The exact semantics of these options depends on the target architecture so which means that you should read the gcc manual prior to using them.

The assembler

The job of the assembler is to convert the assembler output shown in the previous section into a binary file. Of course, it does not make much sense to look at the binary output with your own eyes but you can use deciphering tools such as readelf or objdump. readelf is usable on any recent Linux system I know of and provides great information on the generated object files. It is especially useful to debug linker-related problems. objdump, on the other hand, provides less detailed information but works on a wider variety of systems and can be used to disassemble a binary object file. The following output shows the result of running the two commands:

gcc -S test.c -o test.S
gcc -c test.S -o test.o
objdump -d test.o

int test (int v)
{
        return 100 * v;
}
        .file   "test.c"
        .text
.globl test
        .type   test, @function
test:
        pushl   %ebp
        movl    %esp, %ebp
        movl    8(%ebp), %edx
        movl    %edx, %eax
        sall    $2, %eax
        addl    %edx, %eax
        leal    0(,%eax,4), %edx
        addl    %edx, %eax
        sall    $2, %eax
        leave
        ret
        .size   test, .-test
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)"
test.o:     file format elf32-i386
 
Disassembly of section .text:
 
00000000 :
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 55 08                mov    0x8(%ebp),%edx
   6:   89 d0                   mov    %edx,%eax
   8:   c1 e0 02                shl    $0x2,%eax
   b:   01 d0                   add    %edx,%eax
   d:   8d 14 85 00 00 00 00    lea    0x0(,%eax,4),%edx
  14:   01 d0                   add    %edx,%eax
  16:   c1 e0 02                shl    $0x2,%eax
  19:   c9                      leave
  1a:   c3                      ret

The linker

If you do not disable the linking step with the -c switch, gcc runs its output assembled data through the link process. gcc itself provides a number of switches to control the behavior of the linker most of which are simple wrappers for switches provided by the linker itself. Although it is possible to pass directly switches to the linker through gcc and the -Wl, syntax, it is almost never necessary (i.e., it has not happened to me in a long long time) which is why I do not present here the linker switches themselves but the gcc wrappers.

On most normal Linux systems, users are interested in generating only three kind of binaries from a set of object files:

  • an executable binary,
  • a static library (by convention, the suffix for static library files is .a), and
  • a shared library (by convention, the suffix for shared library files is .so).

Linking an executable

If you have a set of object files, one of which defines a main function compatible with the canonical definition int main (int, char *[]), you can safely run gcc file1.o file2.o file3.o -o binary-name which will generate an executable binary from a set of object files. If your executable depends on either a static library or a shared library, you need to specify the dependency as well as the location of the library:

  • -lNAME is used to add a dependency upon a library: this adds a dependency upon the file libNAME.so if it found or libNAME.a as a backup.
  • -LDIRECTORY is used to add a directory to the list of directories which are searched find the libraries you depend upon.
For example, gcc -lfoo -L/foo/infinite file1.o file2.o -o the-foo-tester generates a binary named the-foo-tester which links statically against file1.o, file2.o and which links against the library foo located in the directory /foo/infinite. The filename of the library foo is expectec to be libfoo.so or, if it is not found, libfoo.a. If none of these files is found, the link step will fail.

Linking a static library

Linking a static library is pretty trivial and does not involve the compiler. In fact, it is not really a link step: it just involves the creation of an archive which contains the .o files you want to put into the library. If you have a list of .o files, the command to create a static library (in fact, an object file archive) is: ar rcs libtest.a foo1.o foo2.o foo3.o foo4.o. The generated .a file can then be used during the final link of an executable as shown above or during the link of a shared library to resolve any missing symbols. It is not possible to:

  • transform a static library into a shared library (unless you extract all the object files present in the static library and link them all into a shared library),
  • integrate a static library into another static library (it works but it does not do what you would expect so don't even try to do it)

Linking a shared library

Linking a shared library is also relatively simple:

  • make sure that all the code you want to put into the library has been compiled with the -fPIC compiler switch,
  • run gcc with the -shared switch: gcc -shared test1.o test2.o -o libfoobar.so.
Of course, such a shared library could depend on other libraries which means that you could use the -l and -L switches to include code from other static or shared libraries.

A few notes about Visual C++

Visual C++ provides a Graphical User Interface to the underlying compilation system but it is possible to access this underlying compilation environment from the DOS command-line. Sometimes, it is necessary.

The Visual C++ C/C++ preprocessor is very close feature-wise to the gcc preprocessor. Its compilation options are very similar:

  • /E instructs the compiler to stop the compilation process just after the preprocessing stage.
  • macros can be defined and undefined on the command-line with the /D and the /U options.
  • the header search rules are described in the manual and can be mostly controlled with the /I switch.