It’s been almost four years since I worked with C. As I wanted to revisit C and brush up on my basics. I started with C compilation. This is one important concept that always fascinated me as it converts the human readable instructions to machine understandable instructions.

In C we do this via compilation. There are four steps involved for human written intruction to get converted into binary(Language used by machines).

To understand the steps in converting source code to binary format. we can use the clang complier’s options for each stage and see their output.

I wrote a simple program(hello.c) for the purpose of demonstration.

#include<stdio.h> int main(void){

printf(“Hi there

”);

}

Any program will go through the below 4 phases for compilation:

Preprocessing: This is the phase where all the comments are removed and macros are expanded. However the output is still in human readable format. Output can be found here

% clang -E hello.c

2. Compilation: This is the phase where we convert the code into Assembly instructions.(These instructions may vary from PC to PC). Output can be found here

% clang -S hello.c

3. Assembly: In this phase we will convert the Assembly instructions to binary format.However the output is not yet ready for execution as the binary code for libraries linked in the program are not yet added to the output.Output can be found here

% clang -c hello.c

4. Linking: This is the phase where we get a complete binary output which includes the libraried linked in our program, for example <stdio.h>.Output can be found here

% clang -o hello hello.c

After the final step, we get a file called a.out by default. But we can change the output filename using -o option(clang -o hello hello.c). This will create a binary called hello rather than cryptic a.out(which is short for assembly output).

Now comes the fun part. if you ever try to open a binary file or an executable and see garbage characters/scrambled characters rather than o’s and 1’s, you are not alone.

The reason behind those cryptic characters is any programs that we use (cat, notepad, more etc.) will open the file containing 0’s and 1’s and convert them to some encoding that they support by default. Hence, we will not be able to see the actual 0’s and 1's.

In order to see the actual binary code, we can use tools like “xxd” which is available in most of the linux/mac distro’s

i.e % xxd -b <binary file name> (ex. % xxd -b a.out)

Hope this post helps!!!

If you have any questions/additions, please feel free to leave a comment below.