Gynvael Coldwind is a security researcher at Google, who hosts weekly livestreams about security and programming in Polish and English). As part of the streams, he gives out missions — basically, CTF-style reverse engineering tasks. Yesterday’s mission was about Elvish — I mean Paint — I mean Python programming and bytecode.

MISSION 011 goo.gl/13Bia9 DIFFICULTY: ██████░░░░ [6╱10] ┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅┅ Finally some real work! One of our field agents managed to infiltrate suspects hideout and steal a pendrive possibly containing important information. However, the pendrive actually requires one to authenticate themselves before accessing the stored files. We gave the pendrive to our laboratory and they managed to dump the firmware. We looked at the deadlisting they sent and for our best knowledge it's some form of Elvish. We can't read it. Here is the firmware: goo.gl/axsAHt And off you go. Bring us back the password. Good luck! --------------------------------------------------------------------------------- If you decode the answer, put it in the comments under this video! If you write a blogpost / post your solution online, please add a link in the comments too! P.S. I'll show/explain the solution on the stream in ~two weeks. P.S.2. Bonus points for recreating the original high-level code.

Here’s the firmware:

co_argcount 1 co_consts (None, '4e5d4e92865a4e495a86494b5a5d49525261865f5758534d4a89', 'hex', 89, 255, 115, 50) co_flags 67 co_name check_password co_names ('decode', 'len', 'False', 'all', 'zip', 'ord') co_nlocals 4 co_stacksize 6 co_varnames ('s', 'good', 'cs', 'cg') 0 LOAD_CONST 1 3 LOAD_ATTR 0 6 LOAD_CONST 2 9 CALL_FUNCTION 1 12 STORE_FAST 1 15 LOAD_GLOBAL 1 18 LOAD_FAST 0 21 CALL_FUNCTION 1 24 LOAD_GLOBAL 1 27 LOAD_FAST 1 30 CALL_FUNCTION 1 33 COMPARE_OP 3 (!=) 36 POP_JUMP_IF_FALSE 43 39 LOAD_GLOBAL 2 42 RETURN_VALUE >> 43 LOAD_GLOBAL 3 46 BUILD_LIST 0 49 LOAD_GLOBAL 4 52 LOAD_FAST 0 55 LOAD_FAST 1 58 CALL_FUNCTION 2 61 GET_ITER >> 62 FOR_ITER 52 (to 117) 65 UNPACK_SEQUENCE 2 68 STORE_FAST 2 71 STORE_FAST 3 74 LOAD_GLOBAL 5 77 LOAD_FAST 2 80 CALL_FUNCTION 1 83 LOAD_CONST 3 86 BINARY_SUBTRACT 87 LOAD_CONST 4 90 BINARY_AND 91 LOAD_CONST 5 94 BINARY_XOR 95 LOAD_CONST 6 98 BINARY_XOR 99 LOAD_GLOBAL 5 102 LOAD_FAST 3 105 CALL_FUNCTION 1 108 COMPARE_OP 2 (==) 111 LIST_APPEND 2 114 JUMP_ABSOLUTE 62 >> 117 CALL_FUNCTION 1 120 RETURN_VALUE

To the uninitiated, this might look like Elvish. In reality, this is Python bytecode — the instruction set understood by Python’s (CPython 2.7) virtual machine. Python, like many other languages, uses a compiler to translate human-readable source code into something more appropriate for computers. Python code compiles to bytecode, which is then executed by CPython’s virtual machine. CPython bytecode can be ported between different hardware, while machine code cannot. However, machine code can often be faster than languages based on virtual machines and bytecode. (Java and C# work the same way as Python, C compiles directly to machine code)

This is the internal representation of a Python function. The first few lines are the member variables of the f.__code__ object of our function. We know that:

it takes 1 argument

it has 7 constants: None, a long string of hex digits, the string 'hex' , and numbers: 89, 255, 115, 50.

its flags are set to 67 (CO_NOFREE, CO_NEWLOCALS, CO_OPTIMIZED). This is the “standard” value that most uncomplicated functions take.

its name is check_password

it uses the following globals or attribute names: decode , len , False , all , zip , ord

it has 4 local variables

it uses a stack of size 6

its variables are named s , good , cs , cg

There are two ways to solve this task: you can re-assemble the dis output with the help of the opcode module, or try to re-create the function by hand, using the bytecode. I chose the latter method.

Reverse-engineering Python bytecode: re-creating the function by hand¶ I started by recreating the original firmware file. I created an empty function and wrote some code to print out __code__ contents and dis.dis output. I also added color-coding to help me read it: #!/usr/bin/env python2 import dis import sys # Write code here def check_password ( s ): pass # Reverse engineering the code cnames = ( 'co_argcount' , 'co_consts' , 'co_flags' , 'co_name' , 'co_names' , 'co_nlocals' , 'co_stacksize' , 'co_varnames' ) cvalues = ( 1 , ( None , '4e5d4e92865a4e495a86494b5a5d49525261865f5758534d4a89' , 'hex' , 89 , 255 , 115 , 50 ), 67 , 'check_password' , ( 'decode' , 'len' , 'False' , 'all' , 'zip' , 'ord' ), 4 , 6 , ( 's' , 'good' , 'cs' , 'cg' )) for n , ov in zip ( cnames , cvalues ): v = getattr ( check_password . __code__ , n ) if v == ov : sys . stderr . write ( ' \033 [1;32m' ) else : sys . stderr . write ( ' \033 [1;31m' ) sys . stderr . flush () sys . stdout . write ( str ( n ) + " " + str ( v ) + "

" ) sys . stdout . flush () sys . stderr . write ( ' \033 [0m' ) sys . stderr . flush () dis . dis ( check_password ) If we run this solver, we get the following output (text in brackets added by me): co_argcount 1 [OK] co_consts (None,) [1/7 match] co_flags 67 [OK] co_name check_password [OK] co_names () [0/6 match] co_nlocals 1 [should be 4] co_stacksize 1 [should be 6] co_varnames ('s',) [1/4 match] 7 0 LOAD_CONST 0 (None) 3 RETURN_VALUE We can see (with the help of colors, not reproduced here), that we’ve got co_argcount , co_flags , co_name correctly. We also have one constant ( None , in every function) and one variable name ( s , the argument name). We can also see dis.dis() output. While it looks similar to the assignment, there are a few noticeable differences: there is no 7 (line number) at the start, and LOAD_CONST instructions in the original code did not have anything in parentheses (only comparisions and loops did). This makes reading bytecode harder, but still possible. (I originally thought about using diff for help, but it’s not hard to do it by hand. I did use diff for the final checking after a manual “conversion”) Let’s stop to look at the constants and names for a second. The long string is followed by hex , and one of the constants is decode . This means that we need to use str.decode('hex') to create a (byte)string of some information. Puzzle answers tend to be human-readable, and this string isn’t — so we need to do some more work. So, let’s try reproducing the start of the original mission code using what we’ve just discussed. Python’s VM is based on a stack. In the bytecode above, you can see that instructions take 0 or 1 arguments. Some of them put things on the stack, others do actions and remove them. Most instruction names are self-explanatory, but the full list can be found in the dis module documentation. Instructions like LOAD and STORE refer to indices in the constants/names/varnames tuples. To make it easier, here’s a “table” of them: constants 0 1 2 3 4 5 6 (None, '4e5d4e92865a4e495a86494b5a5d49525261865f5758534d4a89', 'hex', 89, 255, 115, 50) names (globals, attributes) 0 1 2 3 4 5 ('decode', 'len', 'False', 'all', 'zip', 'ord') varnames (locals, _fast) 0 1 2 3 ('s', 'good', 'cs', 'cg') In order to improve readability, I will use “new” dis output with names in parentheses below: 0 LOAD_CONST 1 ('4e5d4e92865a4e495a86494b5a5d49525261865f5758534d4a89') 3 LOAD_ATTR 0 (decode) 6 LOAD_CONST 2 ('hex') 9 CALL_FUNCTION 1 # function takes 1 argument from stack 12 STORE_FAST 1 (good) As I guessed before, the first line of our function is as follows: def check_password ( s ): good = '4e5d4e92865a4e495a86494b5a5d49525261865f5758534d4a89' . decode ( 'hex' ) # new If we run the solver again, we’ll see that the first 12 bytes of our bytecode match the mission text. We can also see that varnames is filled in half, we’ve added two constants, and one name. The next few lines are as follows: 15 LOAD_GLOBAL 1 18 LOAD_FAST 0 21 CALL_FUNCTION 1 24 LOAD_GLOBAL 1 27 LOAD_FAST 1 30 CALL_FUNCTION 1 33 COMPARE_OP 3 (!=) 36 POP_JUMP_IF_FALSE 43 39 LOAD_GLOBAL 2 42 RETURN_VALUE We can see that we’re putting a global object on stack and calling it with one argument. In both cases, the global has the index 1, that’s len . The two arguments are s and good . We put both lengths on stack, then compare them. If the comparison fails (they’re equal), we jump to the instruction starting at byte 43, otherwise we continue execution to load the second global (False) and return it. This wall of text translates to the following simple code: def check_password ( s ): good = '4e5d4e92865a4e495a86494b5a5d49525261865f5758534d4a89' . decode ( 'hex' ) if len ( s ) != len ( good ): # new return False # new Let’s take another look at our names. We can see we’re missing all , zip , ord . You can already see a common pattern here: we will iterate over both strings at once (using zip ), do some math based on the character’s codes ( ord ), and then check if all results (of a comparison, usually) are truthy. Here’s the bytecode with value annotations and comments, which explain what happens where: >> 43 LOAD_GLOBAL 3 (all) 46 BUILD_LIST 0 49 LOAD_GLOBAL 4 (zip) 52 LOAD_FAST 0 (s) 55 LOAD_FAST 1 (good) 58 CALL_FUNCTION 2 # zip(s, good) 61 GET_ITER # Start iterating: iter() >> 62 FOR_ITER 52 (to 117) # for loop iteration start (if iterator exhausted, jump +52 bytes to position 117) 65 UNPACK_SEQUENCE 2 # unpack a sequence (a, b = sequence) 68 STORE_FAST 2 (cs) # cs = item from s 71 STORE_FAST 3 (cg) # cg = item from good 74 LOAD_GLOBAL 5 (ord) 77 LOAD_FAST 2 (cs) 80 CALL_FUNCTION 1 # put ord(cs) on stack 83 LOAD_CONST 3 (89) 86 BINARY_SUBTRACT # - 89 [subtract 89 from topmost value] 87 LOAD_CONST 4 (255) 90 BINARY_AND # & 255 [bitwise AND with topmost value] 91 LOAD_CONST 5 (115) 94 BINARY_XOR # ^ 115 [bitwise XOR with topmost value] 95 LOAD_CONST 6 (50) 98 BINARY_XOR # ^ 50 [bitwise XOR with topmost value] 99 LOAD_GLOBAL 5 (ord) 102 LOAD_FAST 3 (cg) 105 CALL_FUNCTION 1 # put ord(cs) on stack 108 COMPARE_OP 2 (==) # compare the two values on stack 111 LIST_APPEND 2 # append topmost value to the list in topmost-1; pop topmost (append to list created in comprehension) 114 JUMP_ABSOLUTE 62 # jump back to start of loop >> 117 CALL_FUNCTION 1 # after loop: call all([list comprehension result]) 120 RETURN_VALUE # return value returned by all() We can now write the full answer. listings/gynvaels-mission-11-en/mission11.py (Source) def check_password ( s ): good = '4e5d4e92865a4e495a86494b5a5d49525261865f5758534d4a89' . decode ( 'hex' ) if len ( s ) != len ( good ): return False return all ([ ord ( cs ) - 89 & 255 ^ 115 ^ 50 == ord ( cg ) for cs , cg in zip ( s , good )]) In the end, our dis.dis() output matches the mission text (except the removed values, but their IDs do match), our co_* variables are all green, and we can get to work on solving the puzzle itself! Side note: this task uses a list comprehension. You might want to optimize it, remove the brackets, and end up with a generator expression. This would make the task harder, since would require working with the internal generator code object as well: co_consts (None, '4e5d4e92865a4e495a86494b5a5d49525261865f5758534d4a89', 'hex', <code object <genexpr> at 0x104a86c30, file "mission11-genexpr.py", line 11>) 46 LOAD_CONST 3 (<code object <genexpr> at 0x104a86c30, file "mission11-genexpr.py", line 11>) BINARY_* and ord disappeared from the new listing. You can see the modified code (which differs by two bytes) and solver output.

Solving the real puzzle¶ I solved the extra credit part of the puzzle. The real aim of the puzzle was to recover the password — the text for which check_password() will return True. This part is pretty boring. I built a dictionary, where I mapped every byte (0…255) to the result of the calculation done in the check_password() function’s loop. Then I used that to recover the original text. pass_values = {} for i in range ( 256 ): result = i - 89 & 255 ^ 115 ^ 50 pass_values [ result ] = i good = '4e5d4e92865a4e495a86494b5a5d49525261865f5758534d4a89' . decode ( 'hex' ) password = '' for c in good : password += chr ( pass_values [ ord ( c )]) print ( password ) print ( check_password ( password )) The password is: huh, that actually worked! .