For some time now, I have been very interested in the compilation process of programming languages and how they are converted to assembly. Naturally, I became very interested in LLVM portable assembly syntax, and how its syntax compares to x86 assembly. It turns out that it is very readable and understandable, unlike the bare bones assembly found when disassembling a program. I found that it is actually a lot easier to analyze x86 assembly after it has been been generated from LLVM as intermediary since the LLVM block labels are commented into the assembly blocks and in general the structure looks very similar. Let us look at an example of a simple hello world program.

@0 = internal constant [20 x i8] c"Hello LLVM-C world!" declare i32 @puts(i8*) define void @sayHelloWorld() { aName: %0 = call i32 @puts(i8* getelementptr inbounds ([20 x i8]* @0, i32 0, i32 0)) ret void }

Looking through the following example, the global @0 is set with a string. An external function where the body is defined somewhere else needs to be initialized with the declare keyword. A new function, however, needs to be defined along with its body, return type, and parameters. Each function body needs to have at least one label, which in this case is aName. After the block label, the call function is used which calls puts function from an external definition. The function getelementptr returns a pointer to an element specified with certain bounds. When the inbounds keyword is used, access is denied outside of the bounds specified. The register %0 is set with the result of external puts function, and a void is returned. Now I will present the equivalent x86 assembly.

.section __TEXT,__text,regular,pure_instructions .globl _sayHelloWorld .align 4, 0x90 _sayHelloWorld: ## @sayHelloWorld Leh_func_begin0: ## BB#0: ## %aName subq $8, %rsp Ltmp0: leaq ___unnamed_1(%rip), %rdi callq _puts addq $8, %rsp ret Leh_func_end0: .section __TEXT,__cstring,cstring_literals .align 4 ## @0 ___unnamed_1: .asciz "Hello LLVM-C world!"

Let us look at exactly what is going on. The stack pointer is advanced so that we may put local variables onto the stack. The string we would like to output is saved into %rdi register and _puts is called with the parameter. The stack pointer is returned to what it was originally and we return from the function.I think my favorite part about the following code is that LLVM has left us breadcrumbs which can give us insight into the interoperability of the x86 assembly. Particularly, we can see that @sayHelloWorld, %aName, and @0 entry block entry points are provided for us! Even though the following code snippet is not complex, for code blocks of greater complexity this information might be very important for us.

For reference, here is the C code necessary to generate and run the LLVM Hello World example.

Share this: Twitter

Facebook

Like this: Like Loading... Related