Using LLDB for reverse engineering Dec 20 2019

I've been exploring reverse engineering, and it's a fascinating topic. There are many ways to analyse a binary. Usually, the analysis is divided into two types, static and dynamic. Static analysis is when you decompile the binary and read the assembly code and try to figure out what it does. On the other hand, in dynamic analysis, you execute the binary and analyse it while running. In general, for dynamic analysis, we use a debugger. As you can imagine, there are many debuggers out there. In this post, we are going to use LLDB to analyse a binary. I'll explain the basic commands we would use and a general setup that I find useful when doing dynamic analysis.

LLDB is the debugger that comes with Xcode when you install the developer tools on macOS, so it'll be there if you are already developing some macOS/*OS applications. So let's begin with writing and analysing a simple C program.

Hello, world!

Alright, we are going to write a basic C program, and compile. Create a new file, name it hello.c and add the following content:

1 2 3 4 5 6 #include <stdio.h> int main ( int argc , char * argv []) { printf ( "Hello, world!" ); return 0 ; }

Now compile it using Clang (you can use GCC, or any other compiler, I'm just trying to stay to the tools provided by LLVM used in the Apple ecosystem):

1 2 $ clang hello.c # this should create a.out

Now we are going to use lldb to analyse the a.out.

1 $ lldb a.out

The lldb command, provides us with a REPL where we can run the program, set breakpoints and analyse the code.

Let's run the command:

1 2 3 ( lldb ) r Process 46295 launched: '/Users/perensejo/a.out' ( x86_64 ) Hello, world!Process 46295 exited with status = 0 ( 0x00000000 )

Now, we know what it does when we execute it, but how it does it is what we are interested in.

We are going to assume we don't know anything about the binary, so let's first show the symbol tables. We could use the command nm(1) in the shell.

1 2 3 4 5 6 $ nm a.out 0000000100002008 d __dyld_private 0000000100000000 T __mh_execute_header 0000000100000f50 T _main U _printf U dyld_stub_binder

Or from the debugger, we can show the symbol table using the image command.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 ( lldb ) image dump symtab a.out Symtab, file = /Users/pascualin/a.out, num_symbols = 5: Debug symbol |Synthetic symbol || Externally Visible || | Index UserID DSX Type File Address/Value Load Address Size Flags Name ------- ------ --- --------------- ------------------ ------------------ ------------------ ---------- ---------------------------------- [ 0] 0 Data 0x0000000100002008 0x0000000000000008 0x000e0000 _dyld_private [ 1] 1 X Data 0x0000000100000000 0x0000000000000f50 0x000f0010 _mh_execute_header [ 2] 2 X Code 0x0000000100000f50 0x0000000000000031 0x000f0000 main [ 3] 3 Trampoline 0x0000000100000f82 0x0000000000000006 0x00010100 printf [ 4] 4 X Undefined 0x0000000000000000 0x0000000000000000 0x00010100 dyld_stub_binder

To learn more about all of lldb 's commands, I would recommend reading the help included in lldb . For example, if we wanted to check what the image command does. We can use help image inside lldb , and we'll get a nice description with all the options supported by the command (you can also help help or help apropos to learn more).

Ok, we can see that the binary has a main function. Let's set a breakpoint into main and see what is going on. Yea, I know, the binaries in macOS require you to have a main entry point, but it was an excuse to show you the symbol table for the binary.

Anyways, let's set the breakpoint, and rerun the command. I'm using the short form of the commands, but you can always use the long-form and use tab for auto-complete.:

1 2 3 4 5 6 7 8 9 10 11 12 ( lldb ) b main ( lldb ) r Process 46305 launched: '/Users/fulano/a.out' ( x86_64 ) Process 46305 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1 frame #0: 0x0000000100000f50 a.out`main a.out ` main: -> 0x100000f50 <+0>: pushq %rbp 0x100000f51 <+1>: movq %rsp, %rbp 0x100000f54 <+4>: subq $0x20 , %rsp 0x100000f58 <+8>: movl $0x0 , -0x4 ( %rbp ) Target 0: ( a.out ) stopped.

Alright, we got stopped at the beginning of our main function. This is not an introduction to Assembly language, so I won't go into the details. I will assume you have some familiarity with assembly languages. Let's have a look at our registers:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ( lldb ) register read General Purpose Registers: rax = 0x0000000100000f50 a.out ` main rbx = 0x0000000000000000 rcx = 0x00007ffeefbfe000 rdx = 0x00007ffeefbfdc18 rdi = 0x0000000000000001 rsi = 0x00007ffeefbfdc08 rbp = 0x00007ffeefbfdbf8 rsp = 0x00007ffeefbfdbe8 r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x0000000000000000 r11 = 0x0000000000000000 r12 = 0x0000000000000000 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x0000000100000f50 a.out ` main rflags = 0x0000000000000246 cs = 0x000000000000002b fs = 0x0000000000000000 gs = 0x0000000000000000

As you can see, the instruction pointer is at 0x100000f50 which is exactly where we are at, good. The instruction to be executed is:

1 -> 0x100000f50 <+0>: pushq %rbp

So we are going to be pushing what we have in register rbp into the stack. So let's first look at where the stack pointer "points" to:

1 2 ( lldb ) register read rsp rsp = 0x00007ffeefbfdbe8

That is the address in memory, but what is on that address? We can use the memory command (I'll use the short form):

1 2 3 4 ( lldb ) x/10w $rsp 0x7ffeefbfdbe8: 0x6e44f7fd 0x00007fff 0x6e44f7fd 0x00007fff 0x7ffeefbfdbf8: 0x00000000 0x00000000 0x00000001 0x00000000 0x7ffeefbfdc08: 0xefbfe088 0x00007ffe

Depending on how you prefer to look at your stack, you might want to show it on a single column. I prefer that, so let's add more format to the command and use:

1 2 3 4 5 6 7 8 9 10 11 ( lldb ) x/10w -l 1 $rsp 0x7ffeefbfdbe8: 0x6e44f7fd 0x7ffeefbfdbec: 0x00007fff 0x7ffeefbfdbf0: 0x6e44f7fd 0x7ffeefbfdbf4: 0x00007fff 0x7ffeefbfdbf8: 0x00000000 0x7ffeefbfdbfc: 0x00000000 0x7ffeefbfdc00: 0x00000001 0x7ffeefbfdc04: 0x00000000 0x7ffeefbfdc08: 0xefbfe088 0x7ffeefbfdc0c: 0x00007ffe

That's more like it. Ok, so our stack pointer points to the top of the stack 0x7ffeefbfdbe8 , and we were about to execute the following instruction:

1 -> 0x100000f50 <+0>: pushq %rbp

Let's see what is inside rbp :

1 2 ( lldb ) register read rbp rbp = 0x00007ffeefbfdbf8

So if we push it to the stack, in the top of our stack, we should see 0x 7ffeefbfdbf8 . Let's see if it's true, run the next instruction ( ni ):

1 2 3 4 5 6 7 8 9 ( lldb ) ni Process 46305 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over frame #0: 0x0000000100000f51 a.out`main + 1 a.out ` main: -> 0x100000f51 <+1>: movq %rsp, %rbp 0x100000f54 <+4>: subq $0x20 , %rsp 0x100000f58 <+8>: movl $0x0 , -0x4 ( %rbp ) 0x100000f5f <+15>: movl %edi, -0x8 ( %rbp )

Again let's see our stack:

1 2 3 4 5 6 7 8 9 10 11 ( lldb ) x/10w -l 1 $rsp 0x7ffeefbfdbe0: 0xefbfdbf8 0x7ffeefbfdbe4: 0x00007ffe 0x7ffeefbfdbe8: 0x6e44f7fd 0x7ffeefbfdbec: 0x00007fff 0x7ffeefbfdbf0: 0x6e44f7fd 0x7ffeefbfdbf4: 0x00007fff 0x7ffeefbfdbf8: 0x00000000 0x7ffeefbfdbfc: 0x00000000 0x7ffeefbfdc00: 0x00000001 0x7ffeefbfdc04: 0x00000000

As you can see our stack now shows 0x7ffeefbfdbf8 on top of the stack. But that doesn't look right, it seems like one part of the hex number is on the top and another at the bottom. Well, this is because we are using x10w This shows the format in words (32bits) and we are in a 64bits architecture, so we should use:

1 2 3 4 5 6 7 8 9 10 11 ( lldb ) x/10xw -s 8 -l 1 $rsp 0x7ffeefbfdbe0: 0x00007ffeefbfdbf8 0x7ffeefbfdbe8: 0x00007fff6e44f7fd 0x7ffeefbfdbf0: 0x00007fff6e44f7fd 0x7ffeefbfdbf8: 0x0000000000000000 0x7ffeefbfdc00: 0x0000000000000001 0x7ffeefbfdc08: 0x00007ffeefbfe088 0x7ffeefbfdc10: 0x0000000000000000 0x7ffeefbfdc18: 0x00007ffeefbfe0b4 0x7ffeefbfdc20: 0x00007ffeefbfe0c2 0x7ffeefbfdc28: 0x00007ffeefbfe105

And now the display looks right. Let's keep moving, let's show the disassembly code we are currently in. We can do it by typing di :

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ( lldb ) di a.out ` main: 0x100000f50 <+0>: pushq %rbp -> 0x100000f51 <+1>: movq %rsp, %rbp 0x100000f54 <+4>: subq $0x20 , %rsp 0x100000f58 <+8>: movl $0x0 , -0x4 ( %rbp ) 0x100000f5f <+15>: movl %edi, -0x8 ( %rbp ) 0x100000f62 <+18>: movq %rsi, -0x10 ( %rbp ) 0x100000f66 <+22>: leaq 0x35 ( %rip ) , %rdi ; "Hello, world!" 0x100000f6d <+29>: movb $0x0 , %al 0x100000f6f <+31>: callq 0x100000f82 ; symbol stub for : printf 0x100000f74 <+36>: xorl %ecx, %ecx 0x100000f76 <+38>: movl %eax, -0x14 ( %rbp ) 0x100000f79 <+41>: movl %ecx, %eax 0x100000f7b <+43>: addq $0x20 , %rsp 0x100000f7f <+47>: popq %rbp 0x100000f80 <+48>: retq

Or we can read the memory using x (with the i format) on our instruction register ( rip ).

1 2 3 4 5 6 7 8 9 10 11 ( lldb ) x/10i $rip -> 0x100000f51: 48 89 e5 movq %rsp, %rbp 0x100000f54: 48 83 ec 20 subq $0x20 , %rsp 0x100000f58: c7 45 fc 00 00 00 00 movl $0x0 , -0x4 ( %rbp ) 0x100000f5f: 89 7d f8 movl %edi, -0x8 ( %rbp ) 0x100000f62: 48 89 75 f0 movq %rsi, -0x10 ( %rbp ) 0x100000f66: 48 8d 3d 35 00 00 00 leaq 0x35 ( %rip ) , %rdi ; "Hello, world!" 0x100000f6d: b0 00 movb $0x0 , %al 0x100000f6f: e8 0e 00 00 00 callq 0x100000f82 ; symbol stub for : printf 0x100000f74: 31 c9 xorl %ecx, %ecx 0x100000f76: 89 45 ec movl %eax, -0x14 ( %rbp )

I hope you are getting a better feel for using the memory read ( x short version) and the registers. Ok, we are skipping a few instructions and stop where we see the "Hello, world!" String to be passed to printf .

1 2 3 4 5 ( lldb ) ni -c 5 -> 0x100000f66 <+22>: leaq 0x35 ( %rip ) , %rdi ; "Hello, world!" 0x100000f6d <+29>: movb $0x0 , %al 0x100000f6f <+31>: callq 0x100000f82 ; symbol stub for : printf 0x100000f74 <+36>: xorl %ecx, %ecx

Alright, let's imagine the debugger didn't add that comment showing that it's getting the string. We see that the rdi register will point to the memory address that contains the "Hello, world!" String. It'll be in the rdi register after we execute the instruction.

1 2 3 4 5 ( lldb ) ni -> 0x100000f6d <+29>: movb $0x0 , %al 0x100000f6f <+31>: callq 0x100000f82 ; symbol stub for : printf 0x100000f74 <+36>: xorl %ecx, %ecx 0x100000f76 <+38>: movl %eax, -0x14 ( %rbp )

Let's read the memory that rdi points to (let's read 4 words):

1 2 3 4 5 ( lldb ) x/4w $rdi 0x100000fa2: "Hello, world!" 0x100000fb0: " \x 01" 0x100000fb2: "" 0x100000fb3: ""

We can also take advantage of the s format that will obtain a string until it reaches a "null" character \x01 .

1 2 ( lldb ) x/s $rdi 0x100000fa2: "Hello, world!"

Perfect, you can then see that we have a call to printf and the rest of the teardown of the program. You can continue debugging it on your own, or just use the command continue that will continue until the next breakpoint (which we don't have) or the end of the program in our case.

Ok, that should be enough to get you started. There are a few more details I want to show you. First, if we are debugging a program that we wrote. We have access to the code so we can compile it with additional information for the debugger. Second, we'll see how to set up a command file to make your debugging life easier.

Debugger information

Ok, let's now compile our code using the flag glldb . Using that flag will give additional information to our debugger:

1 2 $ clang -glldb hello.c # This generates a.out

Again, let's jump into lldb .

1 2 3 4 5 6 $ lldb a.out ( lldb ) target create "a.out" Current executable set to 'a.out' ( x86_64 ) . ( lldb ) b main Breakpoint 1: where = a.out ` main + 22 at hello.c:4:3, address = 0x0000000100000f66 ( lldb )

And run the program:

1 2 3 4 5 6 7 8 9 10 11 12 ( lldb ) r Process 46448 launched: '/Users/derik/Documents/Development/re/a.out' ( x86_64 ) Process 46448 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x0000000100000f66 a.out`main(argc=1, argv=0x00007ffeefbfdc08) at hello.c:4:3 1 #include <stdio.h> 2 3 int main ( int argc, char * argv[] ) { -> 4 printf ( "Hello, world!" ) ; 5 return 0 ; 6 } Target 0: ( a.out ) stopped.

Alright, now that shows us the source code in the debugger, that is useful. If we want to go to the next instruction in the code, just use the next ( n short form) command.

1 2 3 4 5 6 7 8 9 10 ( lldb ) n Process 46448 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = step over frame #0: 0x0000000100000f79 a.out`main(argc=1, argv=0x00007ffeefbfdc08) at hello.c:5:3 2 3 int main ( int argc, char * argv[] ) { 4 printf ( "Hello, world!" ) ; -> 5 return 0 ; 6 } Target 0: ( a.out ) stopped.

As you can see, it went straight to the return 0 instruction. When we get the additional debugging information, we can use n to go to the next source code instruction. And we can use ni if we want to step into the assembly instructions. Which is quite handy.

Let's rerun our program and try to show the assembly instructions:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ( lldb ) r There is a running process, kill it and restart?: [ Y/n] y Process 46457 exited with status = 9 ( 0x00000009 ) Process 46463 launched: '/Users/derik/Documents/Development/re/a.out' ( x86_64 ) Process 46463 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x0000000100000f66 a.out`main(argc=1, argv=0x00007ffeefbfdc08) at hello.c:4:3 1 #include <stdio.h> 2 3 int main ( int argc, char * argv[] ) { -> 4 printf ( "Hello, world!" ) ; 5 return 0 ; 6 } Target 0: ( a.out ) stopped. ( lldb ) ni Process 46463 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over frame #0: 0x0000000100000f6d a.out`main(argc=1, argv=0x00007ffeefbfdc08) at hello.c:4:3 1 #include <stdio.h> 2 3 int main ( int argc, char * argv[] ) { -> 4 printf ( "Hello, world!" ) ; 5 return 0 ; 6 } Target 0: ( a.out ) stopped.

Alright, nothing happened. What happened? Well, we are not displaying the assembly code, use the di command to show the disassembly:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ( lldb ) di a.out ` main: 0x100000f50 <+0>: pushq %rbp 0x100000f51 <+1>: movq %rsp, %rbp 0x100000f54 <+4>: subq $0x20 , %rsp 0x100000f58 <+8>: movl $0x0 , -0x4 ( %rbp ) 0x100000f5f <+15>: movl %edi, -0x8 ( %rbp ) 0x100000f62 <+18>: movq %rsi, -0x10 ( %rbp ) 0x100000f66 <+22>: leaq 0x35 ( %rip ) , %rdi ; "Hello, world!" -> 0x100000f6d <+29>: movb $0x0 , %al 0x100000f6f <+31>: callq 0x100000f82 ; symbol stub for : printf 0x100000f74 <+36>: xorl %ecx, %ecx 0x100000f76 <+38>: movl %eax, -0x14 ( %rbp ) 0x100000f79 <+41>: movl %ecx, %eax 0x100000f7b <+43>: addq $0x20 , %rsp 0x100000f7f <+47>: popq %rbp 0x100000f80 <+48>: retq

Now we can use ni + di to view the steps in the assembly code.

You can continue playing with that on your own. Let's now create a custom configuration that will be helpful when we are reverse engineering a binary.

LLDB custom hooks

We can pass as an argument to lldb of a file that contains lldb instructions to be executed when the debugger is executed.

That could be useful, but it becomes much better when we add to that file some lldb hooks. We can define some hooks that will run when the debugger stops (in each step or breakpoint). Create a file revengsetup with the following content:

1 2 3 4 5 6 7 8 9 10 11 ta st a -o "x/x $rax " ta st a -o "x/x $rbx " ta st a -o "x/x $rcx " ta st a -o "x/x $rdx " ta st a -o "x/x $rdi " ta st a -o "x/x $rsi " ta st a -o "x/x $rbp " ta st a -o "x/x $rsp " ta st a -o "x/8w -s 8 -l1 $rsp " ta st a -o "x/10i $rip " b main

What we are doing is adding hooks that display useful information on the state of the registers, the stack, and disassembly code of the current instructions.

Let's try it out with our a.out .

1 2 $ lldb -s revengsetup a.out ( lldb ) r

Run the command, and you'll be able to see all the information on your screen. Very handy.

Final thoughts

There is a lot to reverse engineering than just using a debugger, but it is useful to become proficient with one. This was just a short introduction to get you started, there are more resources out there on the Internet. I wrote this post because the information I found was mostly directed to GDB, and the GDB information was also hidden between assembly language tutorials or books. I wanted to present you with a concise way to jump into lldb without having to thread through lots of pages of how to write assembly. I hope you find it useful.

Let me know what you think, as always, feedback is welcomed.

And also let me know what are you reverse engineering, it is always fun to talk about this stuff.

Related topics/notes of interest