Hello peeps! Been a while. I hope you’re all doing great. This write-up was supposed to be up way sooner to be honest. Recently, @oaktree coded an IRC bot with dynamic linking features which motivated me to finally take the initiative and finish this god damn paper. So without further ado, @oaktree and the rest, let’s get right into it.

Today we are continuing our journey towards the in-depth understanding of our binaries. If this is the first write-up you are reading regarding my series on Linux Internals, I suggest you going through my Dynamic Linking Wizardry post before you keep on reading. The aforementioned article wasn’t too “practical” for my standards so let’s dive deeper this time with some PoC.

Enjoy!

Introduction

Right, firstly, let’s simplify and visualize some terms because those damn Computer Scientists love making our life hard by creating all kinds of confusing name conventions. Symbol is a fancy term to describe mainly functions, objects, variables. In reality, a symbol is just an address / offset. " But what do you mean by that @_py? " Glad you asked.

Below is a disassembly snippet of one of the PoC binaries I’ll be using for today.

Even if you never attempted to disassembly a function, it’s crystal clear that this is the disassembly of our main fuction (shown at the top-left corner of the image). But what else do you see? You see some weird number ( 0x80484a4 ) next to its name. This is its starting address ( in hex ). Meaning, main is practically a bunch of instructions ( 0s and 1s) at a certain offset in our address space. That simple.

The above explanation wasn’t really necessary for the completeness of this article but I always believed that the best way to solidify a concept, is by simplifying it as much as possible. In my opinion, If there is something to take away from this article is this:

Everything is 0s and 1s ( or for the hardware guys, high voltage-low voltage) . What makes your computer do what it does is the context in which is seeing those 0s and 1s.

Phew, after this small break, let’s get back to business.

Note: I’ll be referring to variables, functions as objects from time to time for generality purposes.

Symbol Resolution

You can think of symbol resolution as the DNS of binaries. Simply put, it’s a process of mapping and finding objects in the address space. A curious person would ask " why the hell do we need any of it in the first place? " Well, let me show you.

Note: I’ll start by covering the 32-bit version and gradually move to the 64-bit one.

This is our source code (32-bit):

int var = 12; int func (int b) { return var + b; }

We will compile the above code into a shared library / object (.so) by using the -fpic and -shared flags (how-to ). If you don’t know what shared libraries are, google it, it’s quite simple. It’s practically a bunch of function , variable, object declarations / definitions which you can embed in your main binary’s address space and refer to them “as if” they were defined in your executable’s source file. Anyway, the point I want to prove with the above source code is this:

Our func() function returns a sum. This sum refers to an object outside of the function scope (var in our case), right? It somehow needs to find its location ( address / offset ) in order to read its value before the addition can take place. Which proves my point at the beginning of this article. “var” is practically an address / offset which contains a value. Let’s investigate how this is being accomplished. At start, it might seem weird but while we uncover it gradually you’ll realize that it’s actually quite trivial.

The most interesting part of the above snippet is at offset 0x42f. Let’s have a closer look at the function being called.

mov ecx, DWORD PTR [esp] ret

The low-level veterans probably already see where this is going but let’s walk through it together.

Func() calls <__i686.get_pc_thunk.cx> which in return places the return address from the stack into the ecx register. Let’s have a stack-calling-convention crash course for the newbie ones so you can be able to make sense out of the rest of this write-up.

Stack Calling Convention Crash Course

+--------------+ | ... | | args | | func() vars | +--------------+ | return addr | <-- func() pushes the address +--------------+ of the next instruction on the stack. | ... | | args | |thunk.cx vars | +--------------+

So, what’s really going on? The above masterpiece is an overly-simplified image of the stack RIGHT when the call instruction happened. I’ve precluded some info such as local variable allocation, but it’s not interesting to us at the moment.

Let’s think of it logically. func() is just an address in memory and you want to call a function ( jump to a different address in memory ). Wouldn’t you want to know the way back to func() once you are done with the call to thunk.cx()? Well, that’s how computers do it: They push the address of the next instruction of the caller ( func ) on the stack so as once the callee ( thunk.cx ) is done, the execution can resume at the address of the caller ( func ). Simple and genius.

Let’s go back to our scenario.

ECX contains the address of the next instruction once thunk.cx() has returned. Then, an interesting addition is going on. Hm, an offset is being added to ecx. But why? Well, this is where the one and only Global Offset Table, aka GOT, joins the party. GOT is one of the most fundamental pieces in ELF binaries and I’ve written extensively about it here and there is more on google. Basically, it’s an array of symbol addresses.

The secret behind the offset addition is:

The offset between the text segment ( machine instructions ) and data segment ( global / static variables ) is known at link-time.

What does that practically mean? Well, the linker ( the one responsible for symbol resolution ) knows during run-time the location of every section / segment. Meaning, while we are executing the func()'s instructions ( stored in the text segment ) and we try to refer to an object further away ( recall from our example the “var” global variable which is stored in the data segment ), the linker will add a known offset to the address being executed and resolve the symbol’s offset. Right, all this might sound fancy and crazy but let’s draw it out, shall we?

+--------------------+ var | Data Segment | <----- |.data, .got, .symtab| | | .... | | +--------------------+ | | .... | +0x1bc0 = offset | .... | | +--------------------+ | | Text Segment | | | .text, .rodata | ---------> return var + b; | func() refs "var" | +--------------------+

So the offset addition makes ecx point to the area where “var” can be found, aka GOT. Let’s construct a PoC and prove it to ourselves. Remember, the assembly never lies. Here is our tiny main binary which will be linked against the shared library we created before.

int main() { func(2); return 6; }

Now let’s fire up GDB.

Let’s inspect the assembly.

Let’s make some notes for the above image:

The address we are at during the breakpoint is 0xb7fd843a, which is a classic offset for a shared library’s code and text segment.

thunk.cx is being called and does its magic as we said before

The known offset is being added to ecx and now ecx points to the GOT, which is where the “var” reference can be found.

As you can see, ecx hold the address 0xb7fd9ff4, which should be a GOT address. Let’s find it out.

The column on the right of “PROGBITS” is the offset of GOT’s address ( 0xb7fd1fe4 ) from the base address of our shared library. In case you didn’t notice it, ecx points a couple bytes after GOT’s base address. In particular, it points to 0xb7fd9ff4, which looks identical to .got.plt’s offset. Damn! Did @_py lose his mind? Well, stay with me. Let’s dissect the disassembly.

mov eax,DWORD PTR [ecx-0xc] <-$pc mov eax,DWORD PTR [eax]

The program counter ( PC ) points to the next instruction that is about to be executed right after the breakpoint. The first instruction will do the following:

0xc will be subtracted from the value of ecx. Thus, 0xb7fd9ff4 - 0xc = 0xb7fd9fe8.

Read the address stored at address 0xb7fd9fe8.

Dereference the address and store its content in eax.

Another drawing incoming:

GOT: 0xb7fd1fe4 +-----------------+ | dynamic ptr | +-----------------+ | link_map* struct| +-----------------+ | dl_resolve() | +-----------------+ | .... | +-----------------+ | .... | +-----------------+ | var's addr | <-------- new ecx +-----------------+ | | .... | (-) 0xc +-----------------+ | | .... | <---- ecx after addition +-----------------+

I want to believe it’s clearer now. The GOT as I said before is an array of pointers. One of its indices contains the address of var ( ignore the dl_resolve() and link_map info ).The 2nd instruction will dereference var’s address and place its content ( 12 in our case ) into eax. Nice and easy. For clarity purposes, let’s see what GDB has to say about that.

Looks like var’s address is 0xb7fda00c and it’s stored at address 0xb7fd9fe8! Hmm, but is it?

Ofcourse it is! What the above relocation ( R_386_GLOB_DAT ) is telling us loud and clear is " find the address of var and place it at the offset 0x1fe8, which is the GOT address 0xb7fd9fe8. "

Function Resolution

Now that we know how to reference variables through the help of GOT, it’s time to move on to function resolution, which is pure orgasm. Let’s create our new source files.

Shared Library:

int var = 12; int func_PLT () { return var; } int func() { int a = func_PLT(); return 0; }

Main binary:

int main(void) { printf("Shared library mode on.

"); func(); return 0; }

Bla bla linking process etc. Off to the meat of our scenario. Let’s have a look at the disassembly.

The first instructions are boring and meaningless to us right now so let’s zoom in to:

The above instruction is all the money. It’s where all the magic takes place and let me note that it’s being heavily abused for exploitation purposes. Apparently, we are jumping to address 0x8048410 in order to execute our shared library’s func() code. Let’s see what’s hiding behind that address.

Hm, another jump to a different address? What the hell is going on @_py? Alright, alright, let’s take a step back and rewind.

PLT & GOT Bromance

The Procedure Linkage Table, aka PLT, is a section within the text segment which contains executable code. To be exact, it’s an array and each entry contains surgically picked instructions in order to make dynamic linking possible. For instance, 99% of Linux binaries refer to functions that belong to the libc shared library. Each function that you call from libc has a PLT entry with instructions that will help the dynamic linker find their address. But, enough of words, let’s get a pen and draw it out.

This is how memory looks like when func() is about to get called for the 1st time:

--< +--------------+ main | | ... | | | jmp [email protected] | _ _ | | ... | \ PLT --< +--------------+ \ +-----------------+ \ | PLT stub: | \ | push link_map* | <- \ | jmp dl_resolve()| \ \ +-----------------+ | \ | ... | | \ | ... | | \ | ... | | \ +-----------------+ | ---> | PLT[func] | | _ |0:jmp *GOT[func] | | GOT / |1:push rel_index | | +-------------+ / | jmp stub | _ / | .dynamicptr | / +-----------------+ +-------------+ / | link_map ptr| / +-------------+ / | dl_resolve()| / +-------------+ / | ... | / +-------------+ / | GOT[func] | / | PLT :1 | <--- +-------------+

Let’s analyze the above snippet with the help of assembly. As we saw before, the PLT entry of func() contains the following machine instructions:

/* Jump to the address contained in 0x804a00c which can be found at the data segment (ds). 0x804a00c is a GOT address. */ jmp DWORD PTR ds:0x804a00c /* Push .rel.plt relocation offset (will be explained shortly). */ push 0x18 /* Jump to the PLT stub code. */ jmp 80483d0

When our main function calls func(), it executes the following instructions:

Jump to the PLT entry of func(). The PLT entry instructs an indirect jump to a the func()'s GOT entry. The GOT entry points back to func()'s PLT code which in return pushes a relocation offset. PLT transfers control to the stub, a special PLT entry which pushes a link_map pointer on the stack before calling the dynamic linker for symbol resolution. Dynamic linker resolves func()'s address and patches the GOT entry for future references.

Before I explain a little bit more about step #3 and #4, let me prove to you step #2.

As promised, step #2 will jump to the address contained in 0x80400c which is by no suprise, 0x08048416, aka func()'s PLT code ( push 0x18 ).

link_map Structure

This is a really interesting structure, especially from an exploit dev perspective. Let’s have a look at its members.

struct link_map { /* Shared library's load address. */ ElfW(Addr) l_addr; /* Pointer to library's name in the string table. */ char *l_name; /* Dynamic section of the shared object. Includes dynamic linking info etc. Not interesting to us. */ ElfW(Dyn) *l_ld; /* Pointer to previous and next link_map node. */ struct link_map *l_next, *l_prev; };

Pretty fun stuff eh? Let’s rewind. As I mentioned before at step #4, the PLT entry will push a link_map struct pointer on the stack and then call the dynamic linker. Since we are working on 32-bit binaries, that can only mean one thing. The link_map pointer is one of the arguments ( the second one is the relocation offset which I’ll describe right after ) the dynamic linker needs ( keep in mind that function arguments on 32-bit binaries are passed through the stack ) in order to resolve func()'s address. In practice, its members will be populated at run-time with the appropriate info ( the shared library in which func() belongs to, the shared library’s address etc ).

Extra PoC

Even though a PoC isn’t a must for our case, I’ll add it for the low-level guys.

Note: I’ve used the objdump utility to find the GOT’s address so I will not include that in the PoC.

Aye! We did it! All you need to do in order to understand what just happened is a look at the link_map’s struct members and GOT’s address space which I drew in a quite detailed manner for you. If any of you have questions on how I did it, feel free to comment it down below and I’ll gladly explain it to you. It’s just that it’s not really important in order to grasp the concept of symbol resolution.

Relocation Entries

Even though I’ve briefly explained relocations in the past, it’s about time we get reminded of them. Let’s study the format of those relocations:

Note: This is a “pseudo” version of the official 64-bit relocation structure specification. They don’t differ in anything, but the one below will make much more sense, you’ll see why I did that shortly. And yes, I know, we are talking about 32-bit binaries while the structure is the 64-bit one, but they aren’t any different so why not.

typedef struct { /* Absolute address in memory where the address of the symbol should be written to. The r_offset value is mostly a GOT address. What else could it be. */ long r_offset; /* Relocation type & symbol table index. The relocation type is a pseudo mathematical formula in order to computer the offset. The symbol table index is basically an offset in an array of Elf_Symbol structs. In reality, r_info is: long type:32, long symbol:32; _ _ _ _ _ r_info _ _ _ _ _ _ | | type symbol |--------------||--------------| 32 32 The three least significant bytes are used as an index in the .dynsym and it's calculated through the below macro: #define ELF64_R_SYM(info) ((info)>>32) */ long r_info; /* Boring. */ long addend; } Elf_Rel;

This is the format of a relocation entry. Makes no sense for now but stay with me. Let’s have a flashback.

/* Push .rel.plt relocation offset. */ push 0x18

In case you forgot, that is one of the instructions in the func()'s PLT entry. What does it really do though? 0x18 is an offset inside the .rel.plt section. You can think of .rel.plt as an array of Elf_Rel structures and each one of them describes a different function. 0x18 is practically saying "add 0x18 to the address of the .rel.plt section - > read the relocation entry fields that describe func() - > and pass them to the dynamic linker’s function so he can patch the desired address.

As I mentioned above, the result of the #define ELF64_R_SYM(info) ((info)>>32) macro is an index into the the dynamic symbol table. An ELF binary has a symbol table and a dynamic symbol table. The latter refers to imported functions and the first one to symbols defined by the us, the programmer. Both tables are populated with the same Elf_Sym structure. Let’s have a look at a pseudo version of it as well.

typedef struct { /* Offset into the string table that points to the null-terminated string of the symbol. */ int name; /* The info field is split up into 2 parts as well. _ _ _ _ _ _ _ info _ _ _ _ _ _ _ | | type binding |---------------||---------------| 4 4 Type: Function or data ( 4 bits ) Binding: Local or global ( 4 bits ) There are defined macros in order to calculate the above values as well but we already saw too much for today. */ unsigned char info; /* Unused. */ char reserved; /* Section header index. */ short section; /* Section offset / absolute address */ long value; /* Symbol's size in bytes. */ long size; } Elf_Sym

Let’s recap:

When we import a function from a shared library, our binary’s address space is being populated with a bunch of arrays of structures in order to make dynamic linker’s life easier. In particular, there will be a null-terminated string in the dynamic string table section, a Symbol structure describing some of the symbol’s attributes and finally a few relocation instances in the .rel.plt section pointing to those symbol structures.

I don’t know if you noticed it, but if you actually write down the process, it makes so much sense. What do I mean by that? The linker will need a few vital info in order to resolve the symbol’s address and it gets those through the aforementioned structures.

Symbol’s relocation offset. The section where the symbol is defined. Symbol’s type. Symbol’s name.

Sweet! Ezpz m8! Let’s revise the steps of the symbol resolution process:

Jump to the PLT entry of our symbol. Jump to the GOT entry of our symbol. Jump back to the PLT entry and push an offset on the stack. That offset is actually an Elf_Rel structure describing how to patch the symbol. Jump to the PLT stub entry. Push a pointer to a link_map structure in order for the linker to find in which library the symbol belongs to. Call the dynamic linker. Patch the GOT entry.

Memory Image After Patching

One more drawing to go! Oh boy!

--< +--------------+ main | | ... | | | jmp [email protected] | _ _ | | ... | \ PLT --< +--------------+ \ +-----------------+ \ | PLT stub: | \ | push link_map* | <- \ | jmp dl_resolve()| \ \ +-----------------+ | \ | ... | | \ | ... | | \ | ... | | \ +-----------------+ | ---> | PLT[func] | | _ |0:jmp *GOT[func] | | GOT / |1:push rel_index | | +-------------+ / | jmp stub | _ / | .dynamicptr | / +-----------------+ +-------------+ / | link_map ptr| / +-------------+ / | dl_resolve()| / +-------------+ / | ... | / +-------------+ / | GOT[func] | / | func() | <------- +-------------+ \ \ \ -----> Shared Library +----------------------+ | func() code: | | mov dis, dat | | ... | +----------------------+

The only difference now is that the func()'s GOT entry doesn’t contain an address of the PLT entry anymore, but the address in the shared library where the func() instructions begin. Let’s prove it to ourselves with GDB.

I set a breakpoint right before the func() call and inspected its GOT entry, which includes by no surprise, as we noticed earlier, the address of its PLT entry. Let’s move to the next instruction and have a look at the GOT entry again.

Voila! The GOT entry is fully patched and it contains our function’s address! Meaning, the next time func() will be called, there won’t be any back and forth jumping between PLT and GOT.

Conclusion

If you are reading this sentence, you are a true champ. A couple of notes:

I’m not a native english speaker and this write-up was quite lengthy. Meaning, the more words I write, the bigger the chance for grammatical and vocabulary mistakes. I’ll try to correct them asap if there are any.

I decided not to include the 64-bit version since this post will end up being a book. It’s also the same process with one minor difference. If you really can’t figure out the differences, feel free to request a version for it in the comments. The most important part was to understand the relationship between PLT, GOT and the ELF structures.

If you have any questions, please don’t hesitate asking me. I’d like to thank you for taking the time to read my paper and have an awesome day.

Peace,

@_py