Solving for Hashes in Flare-On #5

18 Sep 2015 . reversing . Comments

Prelude

My previous post uses concolic testing to explore and discover the path that leads to the correct answer. Getting the desired execution path (and basic block) executed means the problem is basically solved. We need to only solve for the inputs that lead to that path.

Unlike #2 that has a focus on finding the right path, this challenge does not have separate paths for success or failure per se. Instead of a desired execution path, we do know the desired encoded output that we want the binary to finally produce. The task is then to solve for the input that used to derive the output. We need to somehow represent the desired encoded output as a series of contraints for the SAT solver to solve for the input.

General Reconnaissance of #5

Challenge #5 consists of two files, a binary and a pcap file. The binary, sender (binary link), is a Windows binary that reads from a key.txt file, presumably performs some computation on the contents, and sends data (likely derived from the key.txt file) via a HTTP POST request.

The pcap file, challenge.pcap , helps illuminate what is being sent. Investigating the pcap in Wireshark, we can tell that the following “hash” or encoded data is sent over separate requests transmitting four bytes at a time.

UDYs1D7bNmdE1o3g5ms1V6RrYCVvODJF1DpxKTxAJ9xuZW==

The disassembly of the main function sub_401100 is very straightforward. The important functions in order of execution are:

ReadFile : Reads key_original from key file

: Reads from key file sub_401250 : In-place encodes key_original ==> key_encoded

: In-place encodes ==> sub_401427 : Allocates a heap buffer to store hash

: Allocates a heap buffer to store sub_4012A0 : Encodes key_encoded ==> hash

... 00401181 call ds:ReadFile ; Read from "key.txt" file ... 00401198 mov edx, esi ; edx = number of bytes read 0040119A lea ecx, [ebp+Buffer] ; ecx = contents of key file 004011A0 call sub_401250 ; Performs an in-place encoding of the ; original key buffer ... 004011D3 push eax 004011D4 push 1 004011D6 call sub_401427 ; Allocates a buffer to store the ; encoded "hash" ... 004011DD lea ecx, [ebp+Buffer] ; ecx = in-place encoded buffer 004011E3 push edi ; edi = len(hash) = ; math.ceil(len(key)/3) * 4 004011E4 push ebx ; ebx = alloc-ed buffer to store hash 004011E5 mov edx, esi ; edx = original len(key) 004011E7 call sub_4012A0 ; Converts encoded key buffer into ; hash to be sent over network

The chunk of instructions before HeapAlloc sub_401427 is called calculates the length of the encoded “hash” string from the length of the original key. Reversing instructions from 0x4011A5 to 0x4011CD indicates that len(hash) = math.ceil(len(key)/3) * 4 . This information is particularly useful for us to reduce the search space of the concolic execution. Since we know that the length of the desired “hash” is 48 bytes, we compute the length of the original key as 34 bytes. We will use this to guide our concolic execution.

Solving #5 - Game Plan

Now that we get a rough idea of the high-level pipeline of the “hash” generation, we shall proceed to use the concolic execution to derive the symbolic constraints on the inputs, and finally use angr in-built SAT solver to solve for the input satisfying the discovered constraints.

First, we initialize angr, load the challenge binary and begin with an execution state starting after the function call to ReadFile() .

In [ 1 ]: import angr In [ 2 ]: p = angr . Project ( 'sender' ) In [ 3 ]: state = p . factory . blank_state ( addr = 0x401198 , remove_options = { simuvex . o . LAZY_SOLVES })

We configure the stack to the state after the key file is read. More importantly, we need to mark the key file buffer as a symbolic one. Note that here we hard-coded the length of the key contents. It will be interesting to see how much longer the concolic execution will take if we set the length as a symbolic value.

In [ 4 ]: LEN_PW = 0x22 In [ 5 ]: ADDR_PW_ORI = state . regs . ebp - 0x80004 In [ 6 ]: state . regs . esi = LEN_PW In [ 7 ]: for i in xrange ( LEN_PW ): ... : state . mem [ ADDR_PW_ORI + i :] . byte = state . BV ( 'pw' , 8 )

Complication 1 - In-place Encoding of Original Key

When the encoding function sub_401250 , we face the first complication – the encoding of the original key buffer is performed in-place within the same buffer. As this buffer will subsequently be used as an input to the hashing function sub_4012A0 , solving symbolically for this buffer (used as input and output at the same time) at the end will derive the sequence of bytes representing the buffer after it has been encoded by sub_401250 .

As a workaround, we will retrofit a portion of the stack [ebp - 0x70004] to be the output of this in-place encoding function. This separation of the input and output buffers used by sub_401250 conveniently allows the constraints to be propagated by the concolic engine.

In [ 8 ]: ADDR_PW_ENC = ADDR_PW_ORI + 0x10000

Additionally, we need to use angr hooking functionality to hook the instruction at 0x401259 within the function sub_401250 to duplicate the original key buffer and update the register ebx to the retrofitted output buffer. The instruction at 0x4011E7 also needs to be hooked to use this output buffer. Note that these are 0-length hooks, i.e. no existing instructions are replaced by the hooks.

In [ 9 ]: def hook_duplicate_pw_buf ( state ): ... : for i in xrange ( LEN_PW ): ... : char_ori = state . memory . load ( ADDR_PW_ORI + i , 1 ) ... : state . memory . store ( ADDR_PW_ENC + i , char_ori ) ... : state . regs . ebx = ADDR_PW_ENC In [ 10 ]: p . hook ( 0x401259 , hook_duplicate_pw_buf , length = 0 ) In [ 11 ]: def hook_use_dup_pw_buf ( state ): .... : state . regs . ecx = ADDR_PW_ENC In [ 12 ]: p . hook ( 0x4011E7 , hook_use_dup_pw_buf , length = 0 )

Complication 2 - HeapAlloc For Hashing

The second complication we face is related to the function sub_401427 . This function calls the HeapAlloc() external library call to allocate a new buffer to hold the “hash” to be computed later. It is challenging to support the use of external libraries in angr, especially if it is just the use of one API import. The best approach to handle this is to (again) retrofit part of the stack as the “new” buffer and use this buffer instead for the hash computation. Note that this time, a hook of length 5 is used because we do not want to replace the original call function at 0x4011D6 .

In [ 13 ]: ADDR_HASH = state . regs . ebp - 0x40000 In [ 14 ]: def hook_heapalloc ( state ): .... : state . regs . eax = ADDR_HASH In [ 15 ]: p . hook ( 0x4011D6 , hook_heapalloc , length = 5 )

Game On

Now we are all set to discover the constraints via concolic execution. We will stop the path exploration just after the return of the hashing function sub_4012A0 .

In [ 16 ]: paths = p . factory . path_group ( state , immutable = False ) In [ 17 ]: paths . explore ( find = 0x4011EC ) Out [ 17 ]: < PathGroup with 1 found >

Next, we add the constraints to make final hash buffer equal to the desired goal hash string that we have derived earlier from the pcap file. We also restrict the hash buffer to only printable bytes.

In [ 18 ]: found_s = paths . found [ 0 ] . state In [ 19 ]: GOAL_HASH = 'UDYs1D7bNmdE1o3g5ms1V6RrYCVvODJF1DpxKTxAJ9xuZW==' In [ 20 ]: for i in xrange ( len ( GOAL_HASH )): .... : char = found_s . memory . load ( ADDR_HASH + i , 1 ) .... : found_s . add_constraints ( char >= 0x21 , char <= 0x7e , .... : char == ord ( GOAL_HASH [ i ]))

Finally, we solve the constraints with the angr solver, and voila!

In [ 21 ]: found_s . se . any_str ( found_s . memory . load ( ADDR_PW_ORI + 0 , 1 )) + \ .... : found_s . se . any_str ( found_s . memory . load ( ADDR_PW_ORI + 1 , LEN_PW - 1 )) Out [ 21 ]: 'Sp1cy_7_layer_OSI_dip@flare-on.com'

Sidenote

Some may question why the constraint solving for the input is done over two separate solver se.any_str() statements. This is an issue that I have been unable to figure out at the moment. When we try to solve the constraints in one statement over the symbolic input buffer as a whole, somehow only the first character is shown. The rest of the buffer consists of only null bytes.

In [ 22 ]: found_s . se . any_str ( found_s . memory . load ( ADDR_PW_ORI , LEN_PW )) Out [ 22 ]: 'S \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 '

Please kindly drop me a note, if you know a way around this.

Full angr Script

Here is the full listing of python script to solve Flare-On #5 with angr.

import angr import simuvex # Globals LEN_PW = 0x22 ADDR_PW_ORI = ADDR_PW_ENC = ADDR_HASH = 0 GOAL_HASH = 'UDYs1D7bNmdE1o3g5ms1V6RrYCVvODJF1DpxKTxAJ9xuZW==' def hook_duplicate_pw_buf ( state ): for i in xrange ( LEN_PW ): char_ori = state . memory . load ( ADDR_PW_ORI + i , 1 ) state . memory . store ( ADDR_PW_ENC + i , char_ori ) state . regs . ebx = ADDR_PW_ENC def hook_use_dup_pw_buf ( state ): state . regs . ecx = ADDR_PW_ENC def hook_heapalloc ( state ): state . regs . eax = ADDR_HASH def main (): global ADDR_PW_ORI , ADDR_PW_ENC , ADDR_HASH # Load binary p = angr . Project ( 'sender' ) # Start with a blank state at the EIP after "key.txt" is read state = p . factory . blank_state ( addr = 0x401198 , remove_options = { simuvex . o . LAZY_SOLVES }) # Initialize global variables ADDR_PW_ORI = state . regs . ebp - 0x80004 ADDR_PW_ENC = ADDR_PW_ORI + 0x10000 ADDR_HASH = state . regs . ebp - 0x40000 # Setup stack to simulate the state after which the "key.txt" is read state . regs . esi = LEN_PW for i in xrange ( LEN_PW ): state . mem [ ADDR_PW_ORI + i :] . byte = state . BV ( 'pw' , 8 ) # Hook instructions to use a separate buffer for the XOR-ing function p . hook ( 0x401259 , hook_duplicate_pw_buf , length = 0 ) p . hook ( 0x4011E7 , hook_use_dup_pw_buf , length = 0 ) # To avoid calling imports (HeapAlloc), retrofit part of the stack as # temporary buffer to hold symbolic copy of the password p . hook ( 0x4011D6 , hook_heapalloc , length = 5 ) # Explore the paths until after the hash is computed paths = p . factory . path_group ( state , immutable = False ) paths . explore ( find = 0x4011EC ) # Add constraints to make final hash equal to the one we want # Also restrict the hash to only printable bytes found_s = paths . found [ 0 ] . state for i in xrange ( len ( GOAL_HASH )): char = found_s . memory . load ( ADDR_HASH + i , 1 ) found_s . add_constraints ( char >= 0x21 , char <= 0x7e , char == ord ( GOAL_HASH [ i ])) # Solve for password that will result in the required hash print found_s . se . any_str ( found_s . memory . load ( ADDR_PW_ORI + 0 , 1 )) + \ found_s . se . any_str ( found_s . memory . load ( ADDR_PW_ORI + 1 , LEN_PW - 1 )) if __name__ == '__main__' : main ()

Edits (09/26/2015): Improved Script

Thanks to jmgrosen on Reddit, we now know that the sidenote is indeed a weird bug.

More importantly, he suggested a more elegant solution around the problem of getting the original password symbolic buffer getting clobbered (see complication 1 subsection above). The trick is to save the symbolic bytes for the original password buffer and then solve for them at the end.

pw = state . BV ( 'PW' , LEN_PW * 8 ) state . mem [ ADDR_PW_ORI :] = pw ... print found_s . se . any_str ( pw )

However, due to the bug, we need to mark the password buffer byte by byte, instead of as a long range of BitVectors.

Here is the improved script listing, without the need to maintain a duplicate buffer at ADDR_PW_ENC :

import angr import simuvex # Globals LEN_PW = 0x22 ADDR_PW_ORI = ADDR_HASH = 0 GOAL_HASH = 'UDYs1D7bNmdE1o3g5ms1V6RrYCVvODJF1DpxKTxAJ9xuZW==' def hook_heapalloc ( state ): state . regs . eax = ADDR_HASH def main (): global ADDR_PW_ORI , ADDR_HASH # Load binary p = angr . Project ( 'sender' ) # Start with a blank state at the EIP after "key.txt" is read state = p . factory . blank_state ( addr = 0x401198 , remove_options = { simuvex . o . LAZY_SOLVES }) # Initialize global variables ADDR_PW_ORI = state . regs . ebp - 0x80004 ADDR_HASH = state . regs . ebp - 0x40000 # Setup stack to simulate the state after which the "key.txt" is read state . regs . esi = LEN_PW # Restrict the password to printable bytes, ending with a null byte pw_char = state . BV ( 'PW_CHAR' , 8 ) pw_str = pw_char for i in xrange ( LEN_PW ): state . add_constraints ( pw_char >= 0x21 , pw_char <= 0x7e ) state . mem [ ADDR_PW_ORI + i :] . byte = pw_char pw_char = state . BV ( 'PW_CHAR' , 8 ) pw_str = pw_str . concat ( pw_char ) state . add_constraints ( pw_char == 0 ) # To avoid calling imports (HeapAlloc), retrofit part of the stack as # temporary buffer to hold symbolic copy of the password p . hook ( 0x4011D6 , hook_heapalloc , length = 5 ) # Explore the paths until after the hash is computed paths = p . factory . path_group ( state , immutable = False ) paths . explore ( find = 0x4011EC ) # Add constraint to make final hash equal to the one we want # Also restrict the hash to only printable bytes found_s = paths . found [ 0 ] . state for i in xrange ( len ( GOAL_HASH )): char = found_s . memory . load ( ADDR_HASH + i , 1 ) found_s . add_constraints ( char >= 0x21 , char <= 0x7e , char == ord ( GOAL_HASH [ i ])) # Solve for password that will result in the required hash print found_s . se . any_str ( pw_str ) if __name__ == '__main__' : main ()

Please enable JavaScript to view the comments powered by Disqus.