Different kinds of files

Compiling C programs requires four kinds of files:

  1. Regular source code files
    These files contain function definitions, and have names which end in “.c” by convention.
  2. Header files
    These files contain function declarations and various preprocessor statements. They are used to allow source code files to access externally-defined functions. Header files end in “.h” by convention.
  3. Object files
    These files are produced as the output of the compiler. They consist of function definitions in binary form, but they are not executable by themselves. Object files end in “.o” by convention(.obj in Windows).
  4. Binary executables
    These are produced as the output of a program called a “linker”. The linker links together a number of object files to produce a binary file which can be directly executed. Binary executables have no special suffix on Unix operating systems, although they generally end in “.exe” on Windows.

Inside the compilation process

There are four phases for a C program to become an executable:

  1. Pre-processing
  2. Compilation
  3. Assembly
  4. Linking

Pre – preprocessor

Before the C compiler starts compiling a source code file, the file is processed by a preprocessor. This is in reality a separate program, but it is invoked automatically by the compiler before compilation proper begins. What the preprocessor does is convert the source code file you write into another source code file. Preprocessor commands start with the pound sign (“#”). There are several preprocessor commands; two of the most important are:

    • #define, This is mainly used to define constants. #define is used in this way so as to avoid having to explicitly write out some constant value in many different places in a source code file.
    • #include, This is used to access function definitions defined outside of a source code file. For instance:
      #include <stdio.h>

      causes the preprocessor to paste the contents of <stdio.h> into the source code file at the location of the #include statement before it gets compiled. #include is used to include header files, which are files which mainly contain function declarations and #define statements. In this case stdio.h included in order to use functions such as printf and scanf.

Preprocessor also removs comments

 

Compilation

The second stage of compilation is confusingly enough called compilation.In this stage, the preprocessed code is translated to assembly instructions specific to the target processor architecture. These form an intermediate human readable language. The existence of this step allows for C code to contain inline assembly instructions and for different assemblers to be used. To save the result of the compilation stage, pass the -S option

% gcc -s hello.c

This will create a file named hello.s, containing the generated assembly instructions.

 

Assembly

During this stage, an assembler is used to translate the assembly instructions to object code. The output consists of actual instructions to be run by the target processor.

% gcc -c hello.c

This tells the compiler to run the preprocessor on the file hello.c and then compile it into the object code file hello.o. The -c option means to compile the source code file into an object file but not to invoke the linker. The contents of this file is in a binary format.

 

Linking

The job of the linker is to link together a bunch of object files (.o files) into a binary executable. This includes both the object files that the compiler created from your source code files as well as object files that have been pre-compiled for you and collected into library files. These files have names which end in .a or .so, and you normally don’t need to know about them, as the linker knows where most of them are located and will link them in automatically as needed.

Like the preprocessor, the linker is a separate program called ld. Also like the preprocessor, the linker is invoked automatically for you when you use the compiler. The normal way of using the linker is as follows:

% gcc hello1.o hello2.o hello3.o -o hello

This line tells the compiler to link together three object files (hello1.o, hello2.o, and hello3.o) into a binary executable file named hello. Now you have a file called hello that you can run.