An interesting issue that comes up when writing code for the x64 architecture is which code model to use. This probably isn't a very well-known topic, but if one wants to understand the x64 machine code generated by compilers, it's educational to be familiar with code models. There are also implications for optimization, for those who really care about performance down to the smallest instruction.

There's very little information on this topic online or anywhere. By far the most important resource is the official x64 ABI, which you can obtain from the x86-64.org page (from now on I'm going to refer to it simply as "the ABI"). There's also a bit of information in the gcc man-pages. The aim of this article is to provide an approachable reference, with some discussion of the topic and concrete examples to demonstrate the concepts in real-life code.

An important disclaimer: this is not a tutorial for beginners. The prerequisites are a solid understanding of C and assembly language, plus a basic familiarity with the x64 architecture.

Code models - motivation References to both code and data on x64 are done with instruction-relative (RIP-relative in x64 parlance) addressing modes. The offset from RIP in these instructions is limited to 32 bits. So what do we do when 32 bits are not enough? What if the program is larger than 2 GB? Then, a case can arise when an instruction attempting to address some piece of code (or data) just can't do it with its 32-bit offset from RIP. One solution to this problem is to give up the RIP-relative addressing modes, and use absolute 64-bit offsets for all code and data references. But this has a high cost - more instructions are required to perform the simplest operations. It's a high cost to pay in all code just for the sake of the (very rare) case of extremely huge programs or libraries. So, the compromise is code models . A code model is a formal agreement between the programmer and the compiler, in which the programmer states his intentions for the size of the eventual program(s) the object file that's being currently compiled will get into . Code models exist for the programmer to be able to tell the compiler: don't worry, this object will only get into non-huge programs, so you can use the fast RIP-relative addressing modes. Conversely, he can tell the compiler: this object is expected to be linked into huge programs, so please use the slow but safe absolute addressing modes with full 64-bit offsets.

What will be covered here The two scenarios described above have names: the small code model promises to the compiler that 32-bit relative offsets should be enough for all code and data references in the compiled object. The large code model, on the other hand, tells it not to make any assumptions and use absolute 64-bit addressing modes for code and data references. To make things more interesting, there's also a middle road, called the medium code model. These code models exist separately for non-PIC and PIC code. The article is going to discuss all 6 variations.

Example C source I'll be using the following C program compiled with different code models to demonstrate the concepts discussed in the article. In this code, the main function accesses 4 different global arrays and one global function. The arrays differ by two parameters: size and visibility. The size is important to explain the medium code model and won't be used for the small and large models. Visibility is either static (visible only in this source file) or completely global (visible by all other objects linked into the program). This distinction is important for the PIC code models. int global_arr[ 100 ] = { 2 , 3 }; static int static_arr[ 100 ] = { 9 , 7 }; int global_arr_big[ 50000 ] = { 5 , 6 }; static int static_arr_big[ 50000 ] = { 10 , 20 }; int global_func ( int param) { return param * 10 ; } int main ( int argc, const char * argv[]) { int t = global_func(argc); t += global_arr[ 7 ]; t += static_arr[ 7 ]; t += global_arr_big[ 7 ]; t += static_arr_big[ 7 ]; return t; } gcc takes the code model as the value of the -mcmodel option. Additionally, PIC compilation can be specified with the -fpic flag. For example, compiling it into an object file with the large code model and PIC enabled: > gcc -g -O0 -c codemodel1.c -fpic -mcmodel=large -o codemodel1_large_pic.o

Small code model Here's what man gcc has to say about the small code model: -mcmodel=small Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model. In other words, the compiler is free to assume that all code and data can be accessed with 32-bit RIP-relative offsets from any instruction in the code. Let's see the disassembly of the example C program compiled in non-PIC small code model: > objdump -dS codemodel1_small.o [...] int main(int argc, const char* argv[]) { 15: 55 push %rbp 16: 48 89 e5 mov %rsp,%rbp 19: 48 83 ec 20 sub $0x20,%rsp 1d: 89 7d ec mov %edi,-0x14(%rbp) 20: 48 89 75 e0 mov %rsi,-0x20(%rbp) int t = global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: e8 00 00 00 00 callq 33 <main+0x1e> 33: 89 45 fc mov %eax,-0x4(%rbp) t += global_arr[7]; 36: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 3c: 01 45 fc add %eax,-0x4(%rbp) t += static_arr[7]; 3f: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 45: 01 45 fc add %eax,-0x4(%rbp) t += global_arr_big[7]; 48: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 4e: 01 45 fc add %eax,-0x4(%rbp) t += static_arr_big[7]; 51: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 57: 01 45 fc add %eax,-0x4(%rbp) return t; 5a: 8b 45 fc mov -0x4(%rbp),%eax } 5d: c9 leaveq 5e: c3 retq As we can see, all arrays are accessed in exactly the same manner - by using a simple RIP-relative offset. However, the offset in the code is 0, because the compiler doesn't know where the data section will be placed. So it also creates a relocation for each such access: > readelf -r codemodel1_small.o Relocation section '.rela.text' at offset 0x62bd8 contains 5 entries: Offset Info Type Sym. Value Sym. Name + Addend 00000000002f 001500000002 R_X86_64_PC32 0000000000000000 global_func - 4 000000000038 001100000002 R_X86_64_PC32 0000000000000000 global_arr + 18 000000000041 000300000002 R_X86_64_PC32 0000000000000000 .data + 1b8 00000000004a 001200000002 R_X86_64_PC32 0000000000000340 global_arr_big + 18 000000000053 000300000002 R_X86_64_PC32 0000000000000000 .data + 31098 Let's fully decode the access to global_arr as an example. Here's the relevant part of the disassembly again: t += global_arr[7]; 36: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 3c: 01 45 fc add %eax,-0x4(%rbp) RIP-relative addressing is relative to the next instruction. So the offset that should be patched into the mov instruction should be relative to 0x3c. The relevant relocation is the second one, pointing to the operand of mov at 0x38. It's R_X86_64_PC32 , which means: take the symbol value, add the addend and subtract the offset this relocation points to. If you do the math you see this ends up placing the relative offset between the next instruction and global_arr , plus 0x1c. This relative offset is just what we need, since 0x1c simply means "the 7th int in the array" (each int is 4 bytes long on x64). So the instruction correctly references global_arr[7] using RIP relative addressing. Another interesting thing to note here is that although the instructions for accessing static_arr are similar, its relocation has a different symbol, pointing to the .data section instead of the specific symbol. This is because the static array is placed by the linker in the .data section in a known location - it can't be shared with other shared libraries. This relocation will eventually get fully resolved by the linker. On the other hand, the reference to global_arr will be left to the dynamic loader to resolve, since global_arr can actually be used (or overridden by) a different shared library . Finally, let's look at the reference to global_func : int t = global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: e8 00 00 00 00 callq 33 <main+0x1e> 33: 89 45 fc mov %eax,-0x4(%rbp) The operand of a callq is also RIP-relative, so the R_X86_64_PC32 relocation here works similarly to place the actual relative offset to global_func into the operand. To conclude, since the small code model promises the compiler that all code and data in the eventual program can be accessible with 32-bit RIP-relative offsets, the compiler can generate simple and efficient code for accessing all kinds of objects.

Large code model From man gcc : -mcmodel=large Generate code for the large model: This model makes no assumptions about addresses and sizes of sections. Here's the disassembled code of main when compiled with the non-PIC large code model: int main(int argc, const char* argv[]) { 15: 55 push %rbp 16: 48 89 e5 mov %rsp,%rbp 19: 48 83 ec 20 sub $0x20,%rsp 1d: 89 7d ec mov %edi,-0x14(%rbp) 20: 48 89 75 e0 mov %rsi,-0x20(%rbp) int t = global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 35: 00 00 00 38: ff d2 callq *%rdx 3a: 89 45 fc mov %eax,-0x4(%rbp) t += global_arr[7]; 3d: 48 b8 00 00 00 00 00 movabs $0x0,%rax 44: 00 00 00 47: 8b 40 1c mov 0x1c(%rax),%eax 4a: 01 45 fc add %eax,-0x4(%rbp) t += static_arr[7]; 4d: 48 b8 00 00 00 00 00 movabs $0x0,%rax 54: 00 00 00 57: 8b 40 1c mov 0x1c(%rax),%eax 5a: 01 45 fc add %eax,-0x4(%rbp) t += global_arr_big[7]; 5d: 48 b8 00 00 00 00 00 movabs $0x0,%rax 64: 00 00 00 67: 8b 40 1c mov 0x1c(%rax),%eax 6a: 01 45 fc add %eax,-0x4(%rbp) t += static_arr_big[7]; 6d: 48 b8 00 00 00 00 00 movabs $0x0,%rax 74: 00 00 00 77: 8b 40 1c mov 0x1c(%rax),%eax 7a: 01 45 fc add %eax,-0x4(%rbp) return t; 7d: 8b 45 fc mov -0x4(%rbp),%eax } 80: c9 leaveq 81: c3 retq Again, looking at the relocations will be useful: Relocation section '.rela.text' at offset 0x62c18 contains 5 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000030 001500000001 R_X86_64_64 0000000000000000 global_func + 0 00000000003f 001100000001 R_X86_64_64 0000000000000000 global_arr + 0 00000000004f 000300000001 R_X86_64_64 0000000000000000 .data + 1a0 00000000005f 001200000001 R_X86_64_64 0000000000000340 global_arr_big + 0 00000000006f 000300000001 R_X86_64_64 0000000000000000 .data + 31080 The large code model is also quite uniform - no assumptions can be made about the size of the code and data sections, so all data is accessed similarly. Let's pick global_arr once again: t += global_arr[7]; 3d: 48 b8 00 00 00 00 00 movabs $0x0,%rax 44: 00 00 00 47: 8b 40 1c mov 0x1c(%rax),%eax 4a: 01 45 fc add %eax,-0x4(%rbp) Here two instructions are needed to pull the desired value from the array. The first places an absolute 64-bit address into rax . This is the address of global_arr , as we shall soon see. The second loads the word at (rax) + 0x1c into eax . So, let's focus on the instruction at 0x3d. It's a movabs - the absolute 64-bit version of mov on x64. It can swing a full 64-bit immediate into a register. The value of this immediate in the disassembled code is 0, so we have to turn to the relocation table for the answer. It has a R_X86_64_64 relocation for the operand at 0x3f. This is an absolute relocation, which simply means - place the symbol value + addend back into the offset. In other words, rax will hold the absolute address of global_arr . What about the function call? int t = global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 35: 00 00 00 38: ff d2 callq *%rdx 3a: 89 45 fc mov %eax,-0x4(%rbp) After a famililar movabs , we have a call instruction that calls a function whose address is in rdx . From a glance at the relevant relocation it's obvious that this is very similar to the data access. Evidently, the large code model makes absolutely no assumptions about the sizes of code and data sections, or where symbols might end up. It just takes the "safe road" everywhere, using absolute 64-bit moves to refer to symbols. This has a cost, of course. Notice that it now takes one extra instruction to access any symbol, when compared to the small model. So, we've just witnessed two extremes. The small model happily assumes everything fits into the lower 2GB of memory, and the large model assumes everything is possible and any symbol can reside anywhere in the full 64-bit address space. The medium code model is a compromise.

Medium code model As before, let's start with a quote from man gcc : -mcmodel=medium Generate code for the medium model: The program is linked in the lower 2 GB of the address space. Small symbols are also placed there. Symbols with sizes larger than -mlarge-data-threshold are put into large data or bss sections and can be located above 2GB. Programs can be statically or dynamically linked. Similarly to the small code model, the medium code model assumes all code is linked into the low 2GB. Data, on the other hand, is divided into "large data" and "small data". Small data is also assumed to be linked into the low 2GB. Large data, on the other hand, is not restricted in its memory placement. Data is considered large when it's larger than a given threshold option, which is 64KB by default. It is also interesting to note that in the medium code model, special sections will be created for the large data - .ldata and .lbss (parallel to .data and .bss ). It's not really important for the sake of this article, however, so I'm going to sidestep the topic. Read the ABI for more details. Now it should be clear why the sample C code has those _big arrays. These are meant for the medium code model to be considered as "large data" (which they certainly are, at 200KB each). Here's the disassembly: int main(int argc, const char* argv[]) { 15: 55 push %rbp 16: 48 89 e5 mov %rsp,%rbp 19: 48 83 ec 20 sub $0x20,%rsp 1d: 89 7d ec mov %edi,-0x14(%rbp) 20: 48 89 75 e0 mov %rsi,-0x20(%rbp) int t = global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: e8 00 00 00 00 callq 33 <main+0x1e> 33: 89 45 fc mov %eax,-0x4(%rbp) t += global_arr[7]; 36: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 3c: 01 45 fc add %eax,-0x4(%rbp) t += static_arr[7]; 3f: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 45: 01 45 fc add %eax,-0x4(%rbp) t += global_arr_big[7]; 48: 48 b8 00 00 00 00 00 movabs $0x0,%rax 4f: 00 00 00 52: 8b 40 1c mov 0x1c(%rax),%eax 55: 01 45 fc add %eax,-0x4(%rbp) t += static_arr_big[7]; 58: 48 b8 00 00 00 00 00 movabs $0x0,%rax 5f: 00 00 00 62: 8b 40 1c mov 0x1c(%rax),%eax 65: 01 45 fc add %eax,-0x4(%rbp) return t; 68: 8b 45 fc mov -0x4(%rbp),%eax } 6b: c9 leaveq 6c: c3 retq Note that the _big arrays are accessed as in the large model, and the other arrays are accessed as in the small model. The function is also accessed as in the small model. I won't even show the relocations since there's nothing new in them either. The medium model is a clever compromise between the small and large models. The program's code is unlikely to be terribly big , so what might push it over the 2GB threshold is large pieces of data statically linked into it (perhaps for some sort of big lookup tables). The medium code model separates these large chunks of data from the rest and handles them specially. All code just calling functions and accessing the other, smaller symbols will be as efficient as in the small code model. Only the code actually accessing the large symbols will have to go the whole 64-bit way similarly to the large code model.

Small PIC code model Let us now turn to the code models for PIC, starting once again with the small model . Here's the sample code, compiled with PIC and the small code model: int main(int argc, const char* argv[]) { 15: 55 push %rbp 16: 48 89 e5 mov %rsp,%rbp 19: 48 83 ec 20 sub $0x20,%rsp 1d: 89 7d ec mov %edi,-0x14(%rbp) 20: 48 89 75 e0 mov %rsi,-0x20(%rbp) int t = global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: e8 00 00 00 00 callq 33 <main+0x1e> 33: 89 45 fc mov %eax,-0x4(%rbp) t += global_arr[7]; 36: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax 3d: 8b 40 1c mov 0x1c(%rax),%eax 40: 01 45 fc add %eax,-0x4(%rbp) t += static_arr[7]; 43: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 49: 01 45 fc add %eax,-0x4(%rbp) t += global_arr_big[7]; 4c: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax 53: 8b 40 1c mov 0x1c(%rax),%eax 56: 01 45 fc add %eax,-0x4(%rbp) t += static_arr_big[7]; 59: 8b 05 00 00 00 00 mov 0x0(%rip),%eax 5f: 01 45 fc add %eax,-0x4(%rbp) return t; 62: 8b 45 fc mov -0x4(%rbp),%eax } 65: c9 leaveq 66: c3 retq And the relocations: Relocation section '.rela.text' at offset 0x62ce8 contains 5 entries: Offset Info Type Sym. Value Sym. Name + Addend 00000000002f 001600000004 R_X86_64_PLT32 0000000000000000 global_func - 4 000000000039 001100000009 R_X86_64_GOTPCREL 0000000000000000 global_arr - 4 000000000045 000300000002 R_X86_64_PC32 0000000000000000 .data + 1b8 00000000004f 001200000009 R_X86_64_GOTPCREL 0000000000000340 global_arr_big - 4 00000000005b 000300000002 R_X86_64_PC32 0000000000000000 .data + 31098 Since the small vs. big data distinction plays no role in the small model, we're going to focus on the difference between local (static) and global symbols, which does play a role when PIC is generated. As you can see, the code generated for the static arrays is exactly equivalent to the code generated in the non-PIC case. This is one of the boons of the x64 architecture - unless symbols have to be accessed externally, you get PIC for free because of the RIP-relative addressing for data. The instructions and relocations used are the same, so we won't go over them again. The interesting case here is the global arrays. Recall that in PIC, global data has to go through GOT, because it may be eventually found or used in other shared libraries . Here's the code generated for accessing global_arr : t += global_arr[7]; 36: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax 3d: 8b 40 1c mov 0x1c(%rax),%eax 40: 01 45 fc add %eax,-0x4(%rbp) And the relevant relocation is a R_X86_64_GOTPCREL , which means: the location of the entry for the symbol in the GOT + addend, minus the offset for applying the relocation. In other words, the relative offset between RIP (of the next instruction) and the slot reserved for global_arr in GOT is patched into the instruction. So what's put into rax in the instruction at 0x36 is the actual address of global_arr . This is followed by dereferncing the address of global_arr plus an offset to its 7th element into eax . Now let's examine the function call: int t = global_func(argc); 24: 8b 45 ec mov -0x14(%rbp),%eax 27: 89 c7 mov %eax,%edi 29: b8 00 00 00 00 mov $0x0,%eax 2e: e8 00 00 00 00 callq 33 <main+0x1e> 33: 89 45 fc mov %eax,-0x4(%rbp) There's a R_X86_64_PLT32 relocation for the operand of callq at 0x2e. This relocation means: the address of the PLT entry for the symbol + addend, minus the offset for applying the relocation. In other words, the callq should correctly call the PLT trampoline for global_func . Note the implicit assumptions made by the compiler - that the GOT and PLT could be accessed with RIP-relative addresing. This will be important when comparing this model to the other PIC code models.

Large PIC code model Here's the disassembly: int main(int argc, const char* argv[]) { 15: 55 push %rbp 16: 48 89 e5 mov %rsp,%rbp 19: 53 push %rbx 1a: 48 83 ec 28 sub $0x28,%rsp 1e: 48 8d 1d f9 ff ff ff lea -0x7(%rip),%rbx 25: 49 bb 00 00 00 00 00 movabs $0x0,%r11 2c: 00 00 00 2f: 4c 01 db add %r11,%rbx 32: 89 7d dc mov %edi,-0x24(%rbp) 35: 48 89 75 d0 mov %rsi,-0x30(%rbp) int t = global_func(argc); 39: 8b 45 dc mov -0x24(%rbp),%eax 3c: 89 c7 mov %eax,%edi 3e: b8 00 00 00 00 mov $0x0,%eax 43: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 4a: 00 00 00 4d: 48 01 da add %rbx,%rdx 50: ff d2 callq *%rdx 52: 89 45 ec mov %eax,-0x14(%rbp) t += global_arr[7]; 55: 48 b8 00 00 00 00 00 movabs $0x0,%rax 5c: 00 00 00 5f: 48 8b 04 03 mov (%rbx,%rax,1),%rax 63: 8b 40 1c mov 0x1c(%rax),%eax 66: 01 45 ec add %eax,-0x14(%rbp) t += static_arr[7]; 69: 48 b8 00 00 00 00 00 movabs $0x0,%rax 70: 00 00 00 73: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax 77: 01 45 ec add %eax,-0x14(%rbp) t += global_arr_big[7]; 7a: 48 b8 00 00 00 00 00 movabs $0x0,%rax 81: 00 00 00 84: 48 8b 04 03 mov (%rbx,%rax,1),%rax 88: 8b 40 1c mov 0x1c(%rax),%eax 8b: 01 45 ec add %eax,-0x14(%rbp) t += static_arr_big[7]; 8e: 48 b8 00 00 00 00 00 movabs $0x0,%rax 95: 00 00 00 98: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax 9c: 01 45 ec add %eax,-0x14(%rbp) return t; 9f: 8b 45 ec mov -0x14(%rbp),%eax } a2: 48 83 c4 28 add $0x28,%rsp a6: 5b pop %rbx a7: c9 leaveq a8: c3 retq And the relocations: Relocation section '.rela.text' at offset 0x62c70 contains 6 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000000027 00150000001d R_X86_64_GOTPC64 0000000000000000 _GLOBAL_OFFSET_TABLE_ + 9 000000000045 00160000001f R_X86_64_PLTOFF64 0000000000000000 global_func + 0 000000000057 00110000001b R_X86_64_GOT64 0000000000000000 global_arr + 0 00000000006b 000800000019 R_X86_64_GOTOFF64 00000000000001a0 static_arr + 0 00000000007c 00120000001b R_X86_64_GOT64 0000000000000340 global_arr_big + 0 000000000090 000900000019 R_X86_64_GOTOFF64 0000000000031080 static_arr_big + 0 Again, the small vs. big data distinction isn't important here, so we'll focus on static_arr and global_arr . But first, there's a new prologue in this code which we didn't encounter earlier: 1e: 48 8d 1d f9 ff ff ff lea -0x7(%rip),%rbx 25: 49 bb 00 00 00 00 00 movabs $0x0,%r11 2c: 00 00 00 2f: 4c 01 db add %r11,%rbx Here's a relevant quote from the ABI: In the small code model all addresses (including GOT entries) are accessible via the IP-relative addressing provided by the AMD64 architecture. Hence there is no need for an explicit GOT pointer and therefore no function prologue for setting it up is necessary. In the medium and large code models a register has to be allocated to hold the address of the GOT in position-independent objects, because the AMD64 ISA does not support an immediate displacement larger than 32 bits. Let's see how the prologue displayed above computes the address of GOT. First, the instruction at 0x1e loads its own address into rbx . Then, an absolute 64-bit move is done into r11 , with a R_X86_64_GOTPC64 relocation. This relocation means: take the GOT address, subtract the relocated offset and add the addend. Finally, the instruction at 0x2f adds the two together. The result is the absolute address of GOT in rbx . Why go through all this trouble to compute the address of GOT? Well, for one thing, as the quote says, in the large model we can't assume that the 32-bit RIP relative offset will suffice to access GOT, so we need a full 64-bit address. On the other hand, we still want PIC, so we can't just place an absolute address into the register. Rather, the address has to be computed relative to RIP. This is what the prologue does. It's just a 64-bit RIP-relative computation. Anyway, now we have the address of GOT firmly in our rbx , let's see how static_arr is accessed: t += static_arr[7]; 69: 48 b8 00 00 00 00 00 movabs $0x0,%rax 70: 00 00 00 73: 8b 44 03 1c mov 0x1c(%rbx,%rax,1),%eax 77: 01 45 ec add %eax,-0x14(%rbp) The relocation for the first instruction is R_X86_64_GOTOFF64 , which means: symbol + addend - GOT. In our case: the relative offset between the address of static_arr and the address of GOT. The next instruction adds that to rbx (the absolute GOT address), and dereferences with a 0x1c offset. Here's some pseudo-C to make this computation easier to visualize: // char* static_arr // char* GOT rax = static_arr + 0 - GOT; // rax now contains an offset eax = *(rbx + rax + 0x1c); // rbx == GOT, so eax now contains // *(GOT + static_arr - GOT + 0x1c) or // *(static_arr + 0x1c) Note an interesting thing here: the GOT address is just used as an anchor to reach static_arr . This is unlike the normal usage of GOT to actually contain the address of a symbol within it. Since static_arr is not an external symbol, there's no point keeping it inside the GOT. But still, GOT is used here as an anchor in the data section, relative to which the address of the symbol can be found with a full 64-bit offset, which is at the same time position independent (the linker will be able to resolve this relocation, leaving no need to modify the code section during loading). How about global_arr ? t += global_arr[7]; 55: 48 b8 00 00 00 00 00 movabs $0x0,%rax 5c: 00 00 00 5f: 48 8b 04 03 mov (%rbx,%rax,1),%rax 63: 8b 40 1c mov 0x1c(%rax),%eax 66: 01 45 ec add %eax,-0x14(%rbp) The code is a bit longer, and the relocation is also different. This is actually a more traditional use of GOT. The R_X86_64_GOT64 relocation for the movabs just tells it to place the offset into the GOT where the address of global_arr resides into rax . The instruction at 0x5f extracts the address of global_arr from the GOT and places it into rax . The next instruction dereferences global_arr[7] , placing the value into eax . Now let's look at the code reference for global_func . Recall that in the large code model we can't make any assumptions regarding the size of the code section, so we should assume that even to reach the PLT we need an absolute 64-bit address: int t = global_func(argc); 39: 8b 45 dc mov -0x24(%rbp),%eax 3c: 89 c7 mov %eax,%edi 3e: b8 00 00 00 00 mov $0x0,%eax 43: 48 ba 00 00 00 00 00 movabs $0x0,%rdx 4a: 00 00 00 4d: 48 01 da add %rbx,%rdx 50: ff d2 callq *%rdx 52: 89 45 ec mov %eax,-0x14(%rbp) The relevant relocation is a R_X86_64_PLTOFF64 , which means: PLT entry address for global_func , minus GOT address. This is placed into rdx , into which rbx (the absolute address of GOT) is later added. The result is the PLT entry address for global_func in rdx . Again, note the use of GOT as an "anchor" to enable position-independent reference to the PLT entry offset.