Continuing the discussion from [KEYGENME - EASY] Cracking Your First Program:

As requested by @pry0cc, this is a crash course on reverse engineering. Before reading this, you should try to solve @dtm challenge:

[KEYGENME - EASY] Cracking Your First Program Challenges Welcome to my first challenge! If you’re interested in becoming a reverse engineer, it’s important that you have a strong background with low level data and can understand the process of disassembly and debugging. Details of the Challenge A friend of yours is an arts major but he hasn’t been making enough money for his software working as a McDonald’s cashier since he graduated. He has recently received a huge opportunity to showcase his computer-generated graphics skills for a gaming company a…

If you didn’t manage or you want to see somebody else solution keep reading.

Our Target

I will start writing our target program. You can try what is described in this paper against the original challenge binaries the process is the same. Showing the C code at the beginning makes the explanation easier. Also, you will note that the code from this example is roughly the same that the one you see in the original challenge.

This is my free implementation of @dtm’s original challenge.

#include <stdio.h> #define PEXIT(s) {printf ("%s",s); return 0;} int main (void) { FILE *f; char buffer[16]; int n; if (((f = fopen ("key.txt", "rb")) != NULL) && (n = fread (buffer, 1, 16, f)) == 16) { for (;n >= 0 && buffer[--n] == 'a';); if (n < 0) PEXIT("Key is correct

"); } PEXIT("Wrong Key

"); }

Fine, let’s compile it for a 32bits platform (as the original challenge) and then, let’s strip all symbols from it to make the process a bit more difficult. In this case, removing the symbols means that we will have to find the main function ourselves.

gcc -m32 -o kg kg.c strip -s kg

EXERCISE:

You can use readelf -s to check the symbols in the program before and after running the strip command. Note how the main function is gone after stripping the binary.

Gathering information

In this case it is not really needed, but in general, you will want to gather as much information about the target as you want… you may also want to run it on a VM and actually take a snapshot before you start… specially if you do dynamic analysis and you do not know what the sample does (backdoor, worm, virus,…)

Some tools you may want to use: file , strings , readelf -a and also objdump .

In this case, we already know everything for this program. After all, we wrote it ourselves, so let’s go straight into the reverse stuff.

EXERCISE:

Run the tools mentioned above on the program you just compile.

Reversing with gdb

Let’s start this tutorial reversing the sample using gdb .

Let’s start:

$ gdb ./kg

(… gdb messages here…)

(gdb)

As this sample does not have symbols (that’s normal in real life) we do not know right away where our main function is (yep, I have already said that). But we can ask gdb about the program entry point:

(gdb) info file

(…)

Entry point: 0x80483f0

(…)

Sometimes you cannot even get this information (when using sstrip -z for instance). In those cases… well, let’s leave those for another post.

Now we can start disassembling the entry point. If you have been following the “Programming for Wannabes” course you should know that the main function is not the first function the OS executes when we run a program. You don’t know… go and read the “Programming for Wannabe” series now!. The main function gets called from the entry point (the entry point is usually named _start ).

Let’s see what we can find at our entry point:

(gdb) x/15i 0x80483f0 0x80483f0: xor %ebp,%ebp 0x80483f2: pop %esi 0x80483f3: mov %esp,%ecx 0x80483f5: and $0xfffffff0,%esp 0x80483f8: push %eax 0x80483f9: push %esp 0x80483fa: push %edx 0x80483fb: push $0x8048620 0x8048400: push $0x80485b0 0x8048405: push %ecx 0x8048406: push %esi 0x8048407: push $0x80484ed 0x804840c: call 0x80483d0 <[email protected]> 0x8048411: hlt 0x8048412: xchg %ax,%ax

NOTE: The x command on gdb dumps memory. After the slash you can indicate the number of items to dump and the format. In this case we are using i that indicates gdb to dump instructions. You can also use the command disassemble :

(gdb) disassemble 0x80483f0,+30

In this case the second parameter is the size in bytes of the dump, instead of the number of instructions to dump. You can use the flag /r to also dump the machine code.

Well, as you see not all symbols have been removed. Being dynamically linked, the program has to keep some information to be able to find the functions provided by other libraries. So the call instruction at 0x804840c is the one that is going to run main and therefore, it receives as a parameter, among other things, the pointer to the main function. In this case it is the last parameter pushed into the stack… So, let’s take a look to the main function!

NOTE: You will find this kind of sequences in many programs out there… you will learn to easily identify them as you keep practising.

The main Function

We have finally found the main function. It is at position 0x80484ed . And this is how it does look like:

(gdb) x/50i 0x80484ed 0x80484ed: push %ebp 0x80484ee: mov %esp,%ebp 0x80484f0: and $0xfffffff0,%esp 0x80484f3: sub $0x30,%esp 0x80484f6: mov %gs:0x14,%eax 0x80484fc: mov %eax,0x2c(%esp) 0x8048500: xor %eax,%eax 0x8048502: movl $0x8048640,0x4(%esp) 0x804850a: movl $0x8048643,(%esp) 0x8048511: call 0x80483e0 <[email protected]> 0x8048516: mov %eax,0x18(%esp) 0x804851a: cmpl $0x0,0x18(%esp) 0x804851f: je 0x8048587 0x8048521: mov 0x18(%esp),%eax 0x8048525: mov %eax,0xc(%esp) 0x8048529: movl $0x10,0x8(%esp) 0x8048531: movl $0x1,0x4(%esp) 0x8048539: lea 0x1c(%esp),%eax 0x804853d: mov %eax,(%esp) 0x8048540: call 0x80483a0 <[email protected]> 0x8048545: mov %eax,0x14(%esp) 0x8048549: cmpl $0x10,0x14(%esp) 0x804854e: jne 0x8048587 0x8048550: cmpl $0x0,0x14(%esp) 0x8048555: js 0x804856d 0x8048557: subl $0x1,0x14(%esp) 0x804855c: lea 0x1c(%esp),%edx 0x8048560: mov 0x14(%esp),%eax 0x8048564: add %edx,%eax 0x8048566: movzbl (%eax),%eax 0x8048569: cmp $0x61,%al 0x804856b: je 0x8048550 0x804856d: cmpl $0x0,0x14(%esp) 0x8048572: jns 0x8048587 0x8048574: movl $0x804864b,(%esp) 0x804857b: call 0x80483b0 <[email protected]> 0x8048580: mov $0x0,%eax 0x8048585: jmp 0x8048598 0x8048587: movl $0x804865a,(%esp) 0x804858e: call 0x80483b0 <[email protected]> 0x8048593: mov $0x0,%eax 0x8048598: mov 0x2c(%esp),%ecx 0x804859c: xor %gs:0x14,%ecx 0x80485a3: je 0x80485aa 0x80485a5: call 0x8048390 <[email protected]> 0x80485aa: leave 0x80485ab: ret 0x80485ac: xchg %ax,%ax 0x80485ae: xchg %ax,%ax 0x80485b0: push %ebp

This is our whole main function. You can see the calls to all the functions we used in our C program ( fopen , fread , puts … yes, gcc automatically converts a printf with a single static string parameter into a puts call). All those functions are provided by libc and therefore, dynamically linked. You can refer to the great paper from @_py on the topic to know more about this process:

Linux Internals ~ Dynamic Linking Wizardry Linux UPDATE: Please refer to Linux Internals - The Art Of Symbol Resolution for a more detailed and updated analysis of the concept. Hey peeps! I hope you’re all doing great. It’s been a long time since my last post. Anyway, today I present to you something completely different from what you’ve been used to seeing from me. This is more a self-research/study material. I’m nowhere close to being an expert on this subject but I busted my ass off trying to understand the inner workings of Linux executab…

Now, we can start reversing each function one by one.

Calling fopen

Let’s skip to address 0x8048502 . The instructions before just set the stack to store our local variables. You will see that type of code all the time just ignore it for this simple case. So at 0x8048502 we have

0x8048502: movl $0x8048640,0x4(%esp) 0x804850a: movl $0x8048643,(%esp) 0x8048511: call 0x80483e0 <[email protected]> 0x8048516: mov %eax,0x18(%esp) 0x804851a: cmpl $0x0,0x18(%esp) 0x804851f: je 0x8048587

The first thing to do is to put the function parameter in the stack (pointed by register ESP ). At offset 0 in the stack (%esp) we store one pointer, and at offset 4 0x4(%esp) or [ESP + 0x4] if you prefer, another pointer. Let’s see to what are those pointers are pointing to:

(gdb) x/s 0x8048640 0x8048640: "rb" (gdb) x/s 0x8048643 0x8048643: "key.txt"

NOTE: Type s for the gdb dump command x will dump a zero-terminated string

Nice, those are actually the two parameters passed to the fopen function. Then we see how the program stores the value of eax at offset 0x18 in the stack. 0x18(%esp) actually means; Give me the content of ESP + 0x18 … it’s called indexed addressing mode).

Let’s create our local variable tables, to keep track of them to help us during the reversing process. For now we only have one:

ESP + 0x18 -> FILE *f;

Finally, the function checks the return value. If it is 0 (or NULL in this case, check the C code above), the program will jump to 0x8048587 .

From these 6 assembly lines we have got the following information:

The key has to be stored in a file named ‘key.txt’ in the current directory

It is open in binary read mode (this is actually irrelevant for this case)

Good, that is the file we have to create. Now we need to figure out what do we have to put in it.

Calling fread

The next function we find is fread . We can follow the same process to reverse it. I’m not going to repeat it here, you should manage yourself. This is the part that pushes the parameters and calls the function:

0x8048521: mov 0x18(%esp),%eax 0x8048525: mov %eax,0xc(%esp) 0x8048529: movl $0x10,0x8(%esp) 0x8048531: movl $0x1,0x4(%esp) 0x8048539: lea 0x1c(%esp),%eax 0x804853d: mov %eax,(%esp) 0x8048540: call 0x80483a0 <[email protected]> 0x8048545: mov %eax,0x14(%esp)

Carefully looking at this code and knowing the prototype of the fread function ( man fread ) we can infer that:

ESP + 0x18 -> FILE *f; (we already knew that) ESP + 0x1c -> buffer ESP + 0x14 -> n;

EXERCISE: Good through the code instruction by instruction and draw the content of the stack (whatever is indexed from %esp ). Identify the parameters passed to the function and the local variables we identified in the table above.

Again the result of the function (the number of items read) is stored at offset 0x14 in the stack. After that, we find the check of the number of bytes read (take a look to the C code if you do not believe me).

0x8048549: cmpl $0x10,0x14(%esp) 0x804854e: jne 0x8048587 .... 0x8048587: movl $0x804865a,(%esp) 0x804858e: call 0x80483b0 <[email protected]> 0x8048593: mov $0x0,%eax 0x8048598: mov 0x2c(%esp),%ecx 0x804859c: xor %gs:0x14,%ecx 0x80485a3: je 0x80485aa 0x80485a5: call 0x8048390 <[email protected]> 0x80485aa: leave 0x80485ab: ret

You can clearly see the comparison of the value returned by fread and stored at offset 0x14 against the value 0x10 (16 in decimal). If the two values are Non-Equal (that’s they are different ), the program will jump to 0x8048587 . This is the same address we jumped to in the case of a problem with fopen (see above). At that position, we will find the puts call with the message shown when the key is wrong:

(gdb) x/s 0x804865a 0x804865a: "Wrong Key"

The rest of the code is just the stack cleaning and the return from the main function.

The Key Check

The code left in the midle of our ASM listing is the key check. For a real program, the key generation is a lot more complex that these few lines, but let’s start with this simple example to master the basics:

0x8048550: cmpl $0x0,0x14(%esp) 0x8048555: js 0x804856d 0x8048557: subl $0x1,0x14(%esp) 0x804855c: lea 0x1c(%esp),%edx 0x8048560: mov 0x14(%esp),%eax 0x8048564: add %edx,%eax 0x8048566: movzbl (%eax),%eax 0x8048569: cmp $0x61,%al 0x804856b: je 0x8048550 0x804856d: cmpl $0x0,0x14(%esp) 0x8048572: jns 0x8048587 0x8048574: movl $0x804864b,(%esp) 0x804857b: call 0x80483b0 <[email protected]> 0x8048580: mov $0x0,%eax 0x8048585: jmp 0x8048598

If you remember our original C code, we are reusing the variable n , the one storing the number of bytes read from fread , as our loop counter. We go from n ( n - 1 actually) to -1 . So the end loop condition consists on checking against 0 and do a js ( Jump if Sign )… In other words, we will jump (leave the loop) if our counter is negative. Otherwise, we decrease the counter by 1 unit, set edx to point to our buffer (check fread above), eax to our counter/index, get the value from the buffer and compare it with 0x61 that is ASCII for the lowercase a letter. Confused… OK, let’s do it again in slow-mo.

First, our local variables table:

ESP + 0x18 -> FILE *f; ESP + 0x1c -> char* buffer ESP + 0x14 -> int n;

Then the code and its interpretation with the help of our local variables table:

0x8048550: subl $0x1,0x14(%esp) ; n-- 0x8048555: lea 0x1c(%esp),%edx ; edx = &buffer 0x8048559: mov 0x14(%esp),%eax ; eax = n 0x804855d: add %edx,%eax ; edx = &buffer + n = &buffer[n] 0x804855f: movzbl (%eax),%eax ; eax = buffer[n] 0x8048562: cmp $0x61,%al ; if (al == 'a') 0x8048564: je 0x8048549 ; continue;

So, we are checking that our buffer is filled with n a 's. If any of the characters read from the file is other than an a , the code will continue like this:

0x8048566: cmpl $0x0,0x14(%esp) 0x804856b: jns 0x8048580 ; if (n > 0) goto Fail_msg_and_exit 0x804856d: movl $0x804864b,(%esp) 0x8048574: call 0x80483b0 <[email protected]> ; puts "Key is Correct" 0x8048579: mov $0x0,%eax 0x804857e: jmp 0x8048591 ; return 0

We check again the value of n … If it is positive ( jns ), that means that we have left the loop before processing the 0x10 characters and the key in the file is wrong, so we have to jump to the code showing the Wrong Key message. Otherwise we show the ‘Key is Correct’ string and we jump to the exit code to leave the main function.

So, this is it for a simple program like this.

Note that you may get different addresses in your system and even slightly different code, depending on your gcc or libc version. But the important thing is the process and that you should be able to reproduce.

Also note that this is static analysis. We haven’t run the code at all. In this case it is not needed, but sometimes it will save you a lot of time (think for instance on a crypter…)

Reversing with Radare2

The process above is a bit tedious, and actually, there is no fun on it once you know what you are doing. It is interesting to do it at the beginning to fully understand what is going on. And it is also good to know how to do it when your super-fancy tool fails. However, if you are in a hurry or this is your everyday work, you will want to use more advanced tools. Radare2 is one of those tools.

Radare2 will automate a lot of things for us. This is a very brief introduction on how to reverse this specific challenge using this tool.

First you load the program in radare2 (the program is called r2 )

$ r2 ./kg -- Setup dbg.fpregs to true to visualize the fpu registers in the debugger view. [0x080483f0]>

Yes, you get a different message every time you start the program.

You can see that it automatically detects the entry point 0x080483f0 and starts from there. We have already saved a few keystrokes.

After that you will usually ask radare to analyse the binary. That is the a command family. Just type a and press ENTER to get a list of the options. For this example, running aa will locate most of the information we need… including the main function (usually you will do the aaaa to get as much as you can from the analysis):

[0x080483f0]> aa [x] Analyze all flags starting with sym. and entry0 (aa) [0x080483f0]>

And now we can just look at the main function

[0x080483f0]> pdf @main / (fcn) main 191 | ; DATA XREF from 0x08048407 (main) | 0x080484ed 55 push ebp | 0x080484ee 89e5 mov ebp, esp | 0x080484f0 83e4f0 and esp, 0xfffffff0 | 0x080484f3 83ec30 sub esp, 0x30 | 0x080484f6 65a114000000 mov eax, dword gs:[0x14] ; [0x14:4]=1 | 0x080484fc 8944242c mov dword [esp + 0x2c], eax | 0x08048500 31c0 xor eax, eax | 0x08048502 c74424044086. mov dword [esp + 4], 0x8048640 ; [0x8048640:4]=0x6b006272 ; "rb" @ 0x8048640 | 0x0804850a c70424438604. mov dword [esp], str.key.txt ; [0x8048643:4]=0x2e79656b LEA str.key.txt ; "key.txt" @ 0x8048643 | 0x08048511 e8cafeffff call sym.imp.fopen | 0x08048516 89442418 mov dword [esp + 0x18], eax | 0x0804851a 837c241800 cmp dword [esp + 0x18], 0 | ,=< 0x0804851f 7466 je 0x8048587 | | 0x08048521 8b442418 mov eax, dword [esp + 0x18] ; [0x18:4]=0x80483f0 section..text | | 0x08048525 8944240c mov dword [esp + 0xc], eax | | 0x08048529 c74424081000. mov dword [esp + 8], 0x10 ; [0x10:4]=0x30002 | | 0x08048531 c74424040100. mov dword [esp + 4], 1 | | 0x08048539 8d44241c lea eax, [esp + 0x1c] ; 0x1c | | 0x0804853d 890424 mov dword [esp], eax | | 0x08048540 e85bfeffff call sym.imp.fread | | 0x08048545 89442414 mov dword [esp + 0x14], eax | | 0x08048549 837c241410 cmp dword [esp + 0x14], 0x10 ; [0x10:4]=0x30002 | ,==< 0x0804854e 7537 jne 0x8048587 | .---> 0x08048550 837c241400 cmp dword [esp + 0x14], 0 | ,====< 0x08048555 7816 js 0x804856d | |||| 0x08048557 836c241401 sub dword [esp + 0x14], 1 | |||| 0x0804855c 8d54241c lea edx, [esp + 0x1c] ; 0x1c | |||| 0x08048560 8b442414 mov eax, dword [esp + 0x14] ; [0x14:4]=1 | |||| 0x08048564 01d0 add eax, edx | |||| 0x08048566 0fb600 movzx eax, byte [eax] | |||| 0x08048569 3c61 cmp al, 0x61 ; 'a' | |`===< 0x0804856b 74e3 je 0x8048550 | `----> 0x0804856d 837c241400 cmp dword [esp + 0x14], 0 | ,===< 0x08048572 7913 jns 0x8048587 | ||| 0x08048574 c704244b8604. mov dword [esp], str.Key_is_correct ; [0x804864b:4]=0x2079654b LEA str.Key_is_correct ; "Key is correct" @ 0x804864b | ||| 0x0804857b e830feffff call sym.imp.puts | ||| 0x08048580 b800000000 mov eax, 0 | ,====< 0x08048585 eb11 jmp 0x8048598 | |```-> 0x08048587 c704245a8604. mov dword [esp], str.Wrong_Key ; [0x804865a:4]=0x6e6f7257 LEA str.Wrong_Key ; "Wrong Key" @ 0x804865a | | 0x0804858e e81dfeffff call sym.imp.puts | | 0x08048593 b800000000 mov eax, 0 | | ; JMP XREF from 0x08048585 (main) | `----> 0x08048598 8b4c242c mov ecx, dword [esp + 0x2c] ; [0x2c:4]=0x280009 ; ',' | 0x0804859c 65330d140000. xor ecx, dword gs:[0x14] | ,=< 0x080485a3 7405 je 0x80485aa | | 0x080485a5 e8e6fdffff call sym.imp.__stack_chk_fail | `-> 0x080485aa c9 leave \ 0x080485ab c3 ret

EXERCISE:

Try to run the pdf @main command before running the aa command and see what happens.

So what do we see in here?

On the left side we see some ASCII graphing indicating the flow of the program. Roughly the jumps the program will do.

It has already assigned symbols to all function calls

It is showing as a comment after the assembly, the content of the memory pointed by any pointer in the code. For instance, take a look to the two fopen parameters at the beginning of the program

parameters at the beginning of the program It will also assign symbols to program data. See how it creates the symbols str.Key_is_correct and symbol str.Wrong_key .

Overall, we are just getting a pretty good overview of the program just with this two commands ( aa and pdf @main ).

Now you need to read the assembly just like before. If this is not easy enough for you, just jump into the radare2 visual mode. First set the main function ( s main ) and then type V (has to be capital)+ENTER. Then press V again… there you go!

EXERCISE:

Crack the program so it always shows the “Key is correct” message

EXERCISE:

Recompile for 64bits (remove -m32 from the compilation command) and repeat the process.

Now go and Reverse’em all!