This is my writeup for byhd, a 2-point challenge from the Defcon Qualifier CTF. You can get the files, including my annotated assembly file, here. This is my second (and final) writeup for the Defcon Qualifiers, you can find the writeup for shitsco here.

This was a reverse engineering challenge where code would be constructed based on your input, then executed. You had to figure out the exact right input to generate a payload that would give you access to the server (so, in a way, there was some exploitation involved).

Up till now, cnot from PlaidCTF has probably been my favourite hardcore reversing level, but I think this level has taken over. It was super fun!



The setup

When you fire up byhd, it listens on port 9730 and waits for connections:

$ strace ./byhd [...] bind ( 3 , { sa_family = AF_INET , sin_port = htons ( 9730 ) , sin_addr = inet_addr ( "192.168.1.201" )} , 16 ) = 0 listen ( 3 , 20 ) = 0 accept ( 3 ,

When you connect, it forks off a new process, and therefore we have to fix it as described in an old post I wrote to ensure everything stays in one process (otherwise, you're gonna have a Bad Time). You also have to add a user and group and such, and you may need to run it as root.

After I fix that, it reads from the socket properly! But when I send data to it, it just disconnects me:

$ nc 192.168 . 1.103 9730 | hexdump - C hello 00000000 ff ff ff ff | .... |

Because it's so, so common in protocols, I tried to prefix a 4-byte length to my string:

$ echo - ne '\x04\x00\x00\x00\x41\x41\x41\x41' | nc - vv 192.168 . 1.103 9730 192 . 168.1 . 103 : inverse host lookup failed : ( UNKNOWN ) [ 192.168 . 1.103 ] 9730 ( ? ) open sent 8 , rcvd 0

No response this time? On the server side, we can see why:

# . / byhd - fixed Segmentation fault

The crash is really weird, too:

# gdb - q . / byhd - fixed ( gdb ) run Program received signal SIGBUS , Bus error. 0x0000000000401cc1 in ?? () ( gdb ) x / i $rip = > 0x401cc1 : call 0x4010b0 <munmap@plt>

SIGBUS at a call? Wat? Instead of AAAA , let's send it \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 :

$ echo -ne '\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' | nc -vv 192.168.1.103 9730

Which results in:

( gdb ) run Program received signal SIGSEGV , Segmentation fault. 0x00007ffff7ff9000 in ?? () ( gdb ) x / 8 i $rip = > 0x7ffff7ff9000 : push rax 0x7ffff7ff9001 : nop 0x7ffff7ff9002 : push rdi 0x7ffff7ff9004 : ( bad ) 0x7ffff7ff9005 : jg 0x7ffff7ff9007 0x7ffff7ff9007 : add dh , bl 0x7ffff7ff9009 : fs 0x7ffff7ff900a : fcomip st , st ( 7 ) ( gdb ) x / 32 xb $rip 0x7ffff7ff9000 : 0x50 0x90 0xff 0xf7 0xff 0x7f 0x00 0x00 0x7ffff7ff9008 : 0xde 0x64 0xdf 0xf7 0xff 0x7f 0x00 0x00 0x7ffff7ff9010 : 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7ffff7ff9018 : 0xc0 0x53 0xdf 0xf7 0xff 0x7f 0x00 0x00

I don't know what's going on, but that sure doesn't look like code it's trying to run!

That's enough of messing around, we're gonna have to do a deep dive to figure out what's happening...

Reading itself

After it forks off a thread, and before it reads anything from the socket, the process opens itself and reads all the bytes:

# strace . / byhd - fixed execve ( "./byhd-fixed" , [ "./byhd-fixed" ] , [ ]) = 0 brk ( 0 ) = 0x1d66000 [...] bind ( 3 , { sa_family = AF_INET , sin_port = htons ( 9730 ) , sin_addr = inet_addr ( "192.168.1.103" )} , 16 ) = 0 listen ( 3 , 20 ) = 0 accept ( 3 , { sa_family = AF_INET , sin_port = htons ( 39624 ) , sin_addr = inet_addr ( "192.168.1.201" )} , [ 16 ]) = 4 [...] chdir ( "/home/byhd" ) = 0 stat ( "./byhd-fixed" , { st_mode = S_IFREG | 0755 , st_size = 18896 , ... }) = 0 open ( "./byhd-fixed" , O_RDONLY ) = 3 read ( 3 , " \177 ELF \2\1\1\0\0\0\0\0\0\0\0\0\2\0 > \0\1\0\0\0\220\2 1@ \0\0\0\0\0 " ... , 18896 ) = 18896 close ( 3 ) = 0 read ( 4 , [...]

That's interesting! First it accepts a connection, then it gets the size of itself, opens itself, and reads the whole thing. Then, finally, it tries to read from the socket it opened.

My first hunch was that it's some anti-tampering code, which isn't 100% wrong (nor was it 100% right). I threw together a quick wrapper in C to fix things:

#include <unistd.h> int main( int argc, const char *argv[]) { execlp( "/home/byhd/byhd-fixed" , "/home/byhd/byhd" , NULL ); printf( "Fail :(

" ); return 0 ; }

Note that I'm running "/home/byhd/fixed/byhd", but setting argv[0] to "/home/byhd/byhd". You can verify with strace that it indeed opens the 'real' executable and not the modified one:

# strace . / wrapper execve ( "./wrapper" , [ "./wrapper" ] , [ ]) = 0 brk ( 0 ) [...] execve ( "/home/byhd/byhd-fixed" , [ "/home/byhd/byhd" ] , [ ]) = 0 brk ( 0 ) = 0x2007000 [...] chdir ( "/home/byhd" ) = 0 stat ( "/home/byhd/byhd" , { st_mode = S_IFREG | 0644 , st_size = 18896 , ... }) = 0 open ( "/home/byhd/byhd" , O_RDONLY ) = 3 read ( 3 , " \177 ELF \2\1\1\0\0\0\0\0\0\0\0\0\2\0 > \0\1\0\0\0\220\2 1@ \0\0\0\0\0 " ... , 18896 ) = 18896 close ( 3 ) = 0 read ( 4 , " \2 0 \0\0\0 " , 4 ) = 4 read ( 4 , " \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 " , 16 ) = 16 mmap ( NULL , 4096 , PROT_READ | PROT_WRITE | PROT_EXEC , MAP_PRIVATE | MAP_ANONYMOUS , - 1 , 0 ) = 0x7f4fccf13000 --- SIGSEGV ( Segmentation fault ) @ 0 ( 0 ) --- +++ killed by SIGSEGV +++ Segmentation fault ( core dumped )

Histogram

For the next little while, I jumped around quite a bit because I wasn't sure exactly what I was trying to do. I eventually decided to start from the top; that is, the code that runs right after it reads the file.

Before I explain what's happening, let's take a look at some assembly! If you're interested in the actual code, have a look at this; otherwise, just skip past to the description. I re-implemented this in a few lines of Ruby.

Note that I've removed a bunch of in-between code, including register movement and error handling, to just show the useful parts:

.text : 0040173B .text : 0040173B .text : 0040175F mov edi , 400 h .text : 00401764 call _malloc .text : 00401769 mov [ rbp + allocated ], rax .text : 0040178C top_loop : .text : 0040178C mov edx , ebx .text : 0040178E mov rax , [ rbp + itself ] .text : 00401792 add rax , rdx .text : 00401795 movzx eax , byte ptr [ rax ] .text : 0040179B lea rdx , [ rax * 4 +0] .text : 004017A3 mov rax , [ rbp + allocated ] .text : 004017A7 add rax , rdx .text : 004017AA mov edx , [ rax ] .text : 004017AC add edx , 1 .text : 004017AF mov [ rax ], edx .text : 004017B1 add ebx , 1 .text : 004017B4 .text : 004017B4 bottom_loop : .text : 004017B4 cmp ebx , [ rbp + itself_length ] .text : 004017B7 jb short top_loop .text : 004017C9 retn

It turns out, the code generates a histogram based on the executable file. In other words, it basically does this:

data = File .new( ARGV [ 0 ]).read histogram = {} histogram.default = 0 data.each_byte() do | b | histogram[b.chr] = histogram[b.chr] + 1 end puts(histogram)

Which might look like:

$ ruby histogram.rb / etc / passwd { " r " => 73 , " o " => 119 , " t " => 54 , " : " => 204 , " x " => 38 , " 0 " => 39 , " / " => 141 , " b " => 78 , " i " => 80 , " n " => 107 , " a " => 105 , " s " => 78 , " h " => 22 , "

" => 34 , " 1 " => 41 , " f " => 27 , " l " => 71 , " e " => 85 , " d " => 81 , " m " => 35 , " 2 " => 18 , " 3 " => 16 , " 4 " => 13 , " v " => 21 , " p " => 58 , " 7 " => 3 , " y " => 31 , " c " => 13 , " 5 " => 12 , " u " => 33 , " w " => 9 , " 6 " => 10 , " 9 " => 3 , " g " => 39 , " " => 68 , " 8 " => 6 , " + " => 2 , " q " => 2 , " k " => 7 , " z " => 4 , " - " => 1 }

Running that code on the actual binary, it works great; but it's a lot longer and much uglier output, so I didn't want to include it here. Feel free to try :)

I was still working off the assumption this was all anti-tampering code. As I said earlier, that was only partly right...

Building a tree

This is where it started to get weird and interesting. I could understand the histogram being generated, but then it stated adding and removing stuff from the array! What was happening!?

Once again, here's the actual annotated code, with the error handling and stuff removed. Feel free to read or skip it!

.text : 00402127 enter_second_loop : .text : 00402127 mov rax , [ rbp + entry_list ] .text : 0040212B mov rdi , rax .text : 0040212E call get_smallest_entry .text : 00402133 mov [ rbp + smallest_entry ], rax .text : 00402137 mov rax , [ rbp + entry_list ] .text : 0040213B mov rdi , rax .text : 0040213E call get_smallest_entry .text : 00402143 mov [ rbp + next_smallest_entry ], rax .text : 00402165 mov edi , 20 h .text : 0040216A call _malloc .text : 0040216F mov [ rbp + new_entry ], rax .text : 004021B2 mov rax , [ rbp + new_entry ] .text : 004021B6 mov rdx , [ rbp + smallest_entry ] .text : 004021BA mov [ rax + entry_struct .left _histogram ], rdx .text : 004021BD mov rax , [ rbp + new_entry ] .text : 004021C1 mov rdx , [ rbp + next_smallest_entry ] .text : 004021C5 mov [ rax + entry_struct .right _histogram ], rdx .text : 004021E2 mov rax , [ rbp + smallest_entry ] .text : 004021E6 mov edx , [ rax + entry_struct .histogram _entry ] .text : 004021E9 mov rax , [ rbp + next_smallest_entry ] .text : 004021ED mov eax , [ rax + entry_struct .histogram _entry ] .text : 004021F0 add edx , eax .text : 004021F2 mov rax , [ rbp + new_entry ] .text : 004021F6 mov [ rax + entry_struct .histogram _entry ], edx .text : 00402201 mov rsi , rdx .text : 00402204 mov rdi , rax .text : 00402207 call add_entry_to_end_maybe

What I could see in the code is that it removed the two smallest entries, created a new entry that points to them, and put it at the end of the list. Then it would loop until there was only one entry. Typing it like that makes it sound really obvious to me, but digging through the code was remarkably difficult. In fact, it took me a long, long time to realize that it was removing the smallest entries and adding a new entry. I totally misunderstood the code...

Anyway, let's look at an example: you have the string 'eebddaafbcdfdbaee'. The histogram looks like:

{" a "=> 3 , " b "=> 3 , " c "=> 1 , " d "=> 4 , " e "=> 4 , " f "=> 2 }

We'll re-write it, for simplicity, like this:

a [ 3 ] b [ 3 ] c [ 1 ] d [ 4 ] e [ 4 ] f [ 2 ]

First, it removes the smallest two entries from the list, c[1] and f[2]:

a [ 3 ] b [ 3 ] d [ 4 ] e [ 4 ]

And replaces them with another node that contains their combined value, with the two removed values underneath:

a [ 3 ] b [ 3 ] d [ 4 ] e [ 4 ] [ 3 ] / \ c [ 1 ] f [ 2 ]

Then the next two smallest values are removed and combined under a single parent:

d [ 4 ] e [ 4 ] [ 3 ] [ 6 ] / \ / \ c [ 1 ] f [ 2 ] a [ 3 ] b [ 3 ]

Then remove the smallest two again. This step is interesting because one of the smallest nodes is a non-leaf:

e [ 4 ] [ 6 ] / \ a [ 3 ] b [ 3 ]

Add them back under a common node at the end:

e [ 4 ] [ 6 ] [ 7 ] / \ / \ a [ 3 ] b [ 3 ] [ 3 ] d [ 4 ] / \ c [ 1 ] f [ 2 ]

Then the [4] and [6] are similarly combined:

[ 10 ] [ 7 ] / \ / \ e [ 4 ] [ 6 ] [ 3 ] d [ 4 ] / \ / \ a [ 3 ] b [ 3 ] c [ 1 ] f [ 2 ]

And finally, we only have a single parent node:

[ 17 ] / \ [ 10 ] [ 7 ] / \ / \ e [ 4 ] [ 6 ] [ 3 ] d [ 4 ] / \ / \ a [ 3 ] b [ 3 ] c [ 1 ] f [ 2 ]

And there you have it! A tree!

It's really funny: when I was working on this, I had the feeling at the back of my mind that this was a real tree algorithm, but I read a bunch on Wikipedia and couldn't find one that matched, so I gave up and just continued. Today, my co-worker mentioned "oh, like Huffman encoding!" and I said "nah, there's no compression involved".

But, as soon as I built that tree by hand, I realized that this absolutely IS a Huffman Tree! And checking Wikipedia, I can confirm that it is. That would have made the last part a whole lot easier...

Using the incoming data

Now, what's going on with the data I send in? I already confirmed that it's a 4-byte length value followed by some code, and the program crashes in different and creative ways depending on what code I send it. Now what?

If I'd recognized the Huffman Tree, I could have made a pretty good guess: that we're sending huffman-compressed data. And it would have been right, too! Unfortunately, I missed the obvious hints...

Anyway, I don't really want to dwell too much on this part, since it's conceptually really simple. There was a loop that would go through the data string you sent it, and touch each byte you sent 8 times. Hmm. Then there was some bitwise arithmetic that would do some shifting and ANDing of each byte. You'd think I would have figured out by that that it's breaking the string into bits, but I didn't. What made the light bulb go on was this:

.text : 00401315 and eax , 1 .text : 00401318 test eax , eax .text : 0040131A jz short use_left .text : 0040131C mov rax , [ rbp + current_node ] .text : 00401320 mov rax , [ rax + entry_struct .right _histogram ] .text : 00401324 mov [ rbp + current_node ], rax .text : 00401328 jmp short restart_loop .text : 0040132A .text : 0040132A .text : 0040132A use_left : .text : 0040132A mov rax , [ rbp + current_node ] .text : 0040132E mov rax , [ rax + entry_struct .left _histogram ] .text : 00401331 mov [ rbp + current_node ], rax .text : 00401335 .text : 00401335 restart_loop : .text : 00401335 add [ rbp + current_index_maybe ], 1

Basically, based on the right-most bit in eax, it makes a decision to either jump to the left node or the right node. Then, when it gets to the leaf...

.text : 00401363 mov rax , [ rbp + current_node ] .text : 00401367 movzx eax , [ rax + entry_struct .byte _value ] .text : 00401374 pop rbx .text : 00401375 pop rbp .text : 00401376 retn

...it returns the byte value in that leaf. If you read up on Huffman Trees, you'll see that that's exactly how they work.

The calling function adds that byte value to the end of an executable memory segment. When we're done reading the tree, the list of leaf-node bytes we've found are executed.

To summarize:

Build a histogram from itself

Convert that histogram into a Huffman Tree

Read data from the socket

Convert that data to bits, and use those bits to navigate the tree

We're almost done but... how do we get that tree!?

Putting it all together

All right, we understand the code, now we have to write an exploit. What do we do!?

The "right" way to do this is to build the same tree the same way, and to walk it from the node to the root to get he values. But this is a CTF and I want it to work on the first try, damnit! None of that "reconstructing the exact algorithm" nonsense! So I decided to steal their memory. :)

So, the first thing I did was put a breakpoint immediately after the tree was built to find the address of the root. On my box, the address of the root node, after the tree was built, happened to be 0x60e050. Then I wanted to dump the heap:

( gdb ) x / 1000000 xb 0x603000 0x603000 : 0x1b 0x0c 0x07 0x08 0x90 0x01 0x07 0x10 0x603008 : 0x14 0x00 0x00 0x00 0x1c 0x00 0x00 0x00 0x603010 : 0x80 0xe1 0xff 0xff 0x2a 0x00 0x00 0x00 0x603018 : 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x603020 : 0x14 0x00 0x00 0x00 0x00 0x00 0x00 0x00 [...]

I determined the starting address by trial and error - I wanted it to be as small as possible, but I needed the memory between it and the root node to be contiguous. 0x602000 wasn't allocated, but 0x603000 worked fine.

I dumped that to a file from gdb, then processed the file with Ruby:

file.split( /

/ ).each do | line | if (line =~ / Cannot / ) break end addr, data = line.split( / : \s / , 2 ) addr = addr.to_i( 16 ) data.gsub!( / 0x / , '' ) data.gsub!( / [^ a-fA-F0-9 ] / , '' ) qword = [data].pack( " H* " ).unpack( " Q " ).pop dword1 = qword & 0x0FFFFFFFF dword2 = (qword >> 32 ) & 0x0FFFFFFFF @@qwords [addr] = qword @@dwords [addr] = dword1 @@dwords [addr+ 4 ] = dword2 end

Now I have a nice address->value mapping for the entire dumped memory. So I can now define a function to read a node:

def get_node (addr) node = {} node[ :addr ] = addr node[ :left ] = @@qwords [addr + 0x00 ] node[ :right ] = @@qwords [addr + 0x08 ] node[ :value ] = @@dwords [addr + 0x10 ] node[ :count ] = @@dwords [addr + 0x1c ] return node end

And, starting at the root, re-build the whole tree recursively:

@@seqs = {} def walk_nodes (addr, seq = "" ) node = get_node(addr) if (node[ :left ] != 0 ) walk_nodes(node[ :left ], seq + " 0 " ) end if (node[ :right ] != 0 ) walk_nodes(node[ :right ], seq + " 1 " ) end if (node[ :left ] == 0 ) @@seqs [node[ :value ]] = seq end end node = get_node( ROOT ) walk_nodes( ROOT )

In the end, @@seqs looks like this:

0 = > 0 255 = > 1000 215 = > 100100000000 177 = > 100100000001 30 = > 10010000001 56 = > 1001000001 220 = > 100100001 114 = > 10010001 7 = > 1001001 133 = > 10010100 97 = > 10010101 ...

Now that I have a mapping, I can do a bunch of lookups for the shellcode of my choice:

tree_code = "" SHELLCODE .split( // ).each do | c | tree_code += @@seqs [c.ord] end while ((tree_code.length % 8 ) != 0 ) do tree_code += ' 0 ' end tree_code = [tree_code].pack( " B* " ) tree_code = [tree_code.length].pack( " I " ) + tree_code tree_code.split( // ).each do | b | print( ' \x%02x ' % b.ord) end puts()

And I'm done! I have a compressed version of my shellcode that I can feed directly into the program:

\x89\x00\x00\x00\xd7\x5f\x67\xae\xbe\x35\xd7\xdf\x0d\x75\xfe\x09\x7d\x75\xf6\x75\x13\xba\xc3\x51\x33\xd4\x55\x44\xfc\xea\x91\x51\x3d\x0d\x83\xc3\xf5\xd1\x7e\xd9\xeb\xaf\xbe\x17\xd7\x5f\xe0\x98\xb3\x7f\x0f\x8e\xab\xbb\x2d\x9a\xfb\xaa\xee\xea\x0c\xfc\xd7\xdd\x57\xc7\xd5\x48\x7e\x9e\x03\xaf\x6c\x0a\x89\xe2\xaa\x46\x2d\x76\xc3\x51\x35\x36\xc1\xe1\xfa\xf5\xd7\xdf\x0a\x89\xc0\xa8\xad\x47\x55\x51\x33\x16\xc1\xe1\xfa\xf9\x6f\x86\xba\xf8\xae\x22\xb8\x8a\x8a\xaa\x46\xb0\x7c\x0b\x81\x6d\x39\xfe\x5e\x05\x47\x57\xad\xf3\x20\x78\xeb\x88\xdf\x6c\x35\x13\xc0\x6c\x1e\x1f\xac

Conclusion: my 118 bytes of shellcode compresses down to a clean 142 bytes. :)

Summary

So, once you figure it out, this level is actually pretty straight forward!

Basically, read its own binary, build a Huffman Tree, then use the user's input to walk that Huffman Tree to build the executable code to run. Or, in other words, decompress and run the shellcode that we send!