A test on the hash length reveals that this is very likely an MD5 hash. Let’s see if it is simply an MD5SUM of the data:

That would’ve been too easy

Obviously, the hash is salted. I would’ve lost all faith in the developers if they didn’t even salt the hash.

MD5 is relatively fast to brute-force. However, even with a very powerful computer, cracking the entire hash would be quite lengthy. We don’t know the length of the salt, if it is before or after the actual body, if there’s XOR-ing or any other kind of funny thing going on. Brute-forcing at that point could work, but would be a waste of time (and power, think about the planet).

It’s time to reverse engineer the thing. I fire up my Android SDK, and start pulling the game’s data from my phone. I hadn’t really paid attention so far, but the game is actually built using Unity. In Unity, if you use the default settings, the C# assembly will be packaged as-is in the APK, and a quick Resharpr can reverse almost the entire code base in a fairly human-readable way, not unlike using Dex2Jar and JD-GUI on a classic, unobfuscated Java Android app for example:

Example of decompiled Java app, screenshot courtesy of https://futurestud.io

However, after a quick unpack of the game’s APK, we can immediately notice the devs took some precautions to prevent people like me from doing what I’m trying to do. The game’s LUA files are encrypted, and the C# logic went through Unity’s IL2CPP engine, which converts most of the game’s C# assembly logic to native code, leaving a single “libil2cpp.so” binary, much harder to reverse. A quick look in a disassembler make it clear: everything that could be stripped was stripped, there’s no function name that looks like the one of a game (like “health”, “energy”, “map”, “level”, etc), so loading this file into IDA would just be a hell to go through.

After some Googling, I found a nice tool by GitHub’s user nevermoe, called unity_metadata_loader. This tool scans the global-metadata.dat file generated by Unity during the Il2Cpp process, which contains a mapping of the C# Assembly names and their location inside the libil2cpp.so generated file. The result can then be loaded into IDA for a cleaner reverse engineering. The Linux fan I am is hurt by the Windows-exclusive availability of those tools, but I guess we can make an exception for today.

I copied both libil2cpp.so and global-metadata.dat files from the game into a separate folder, and ran the tool on them. Yay, it worked! We obtained two output files, method_name.txt which contains close to 28,000 method names, and string_literal.txt which contains about 9,000 string constants. The contents of the first file looks very promising:

This looks like methods a game could use

This seems to confirm that we’re headed in the right direction. I can see many other interesting class names, like “HttpClient”, “HttpWrap” and “LibHttp”, which we’ll hear more about in a minute.

I fire up IDA, load the (huge) libil2cpp.so ARM binary, wait for all functions to be disassembled, and load the unity_decoder.py script. It loads the two text files we’ve generated earlier, map it to the library’s entry point, and boom, it looks like something we can work on:

The functions, originally called “sub_XXXXX” (where XXXXX is their address in the binary), now have proper labels. We can start digging.

Remember those interesting class names we’ve had previously? Time to take a look at them! Earlier in the mitmproxy dump, we could see that every request was an HTTP post. My first match to load will be “HttpWrap$$PostAsync”. A quick tap on the F5 key lets us see C-like pseudocode. I’m quite experienced with C, however I’m terrible at reading assembler code, even less at ARM assembler code, so this approximate translation will help quite a bit in the process.

This is what we obtain:

Ehhh… What’s all this?

So we’re half-lucky here. A lot of labels are missing, but there’s a great deal of ToLua calls, which seems to indicate that we’re in a function that is mapped in Lua. As a reminder, Unity allows you to write C# code, but also have some gameplay logic written in Lua for the sake of simplicity. It seems like the HttpWrap class is simply a wrapper around another class performing the actual HTTP POST request, bridging the call to LUA. The HttpWrap$$Register method seems to confirm this, as it contains multiple calls to LuaState__RegFunction, with a pointer starting off the HttpWrap class definition pointer every time.

At line 34 and 49 of this pseudocode, you can notice that there are calls to LibHttp__PostAsync_0 and LibHttp__PostAsync, so let’s look at them. The last one is simply a shortcut call to the first, more complete one:

Inside this function’s definition, we can see a call this time to another class method, HttpClient__PostAsync. We look inside this method’s pseudocode, and we’re going to stop there. From the looks of it, it seems like this is where the request is built, and the headers populated, most likely thanks to the call of a method named “Crypto__ComputeHash”. A-ha! There you are!

However, we now have one big issue: what are all the function parameters? Since the Il2Cpp wraps many C# structures, most of the parameters passed are just integers, or rather pointer to C#-mapped structures. They could be actual integers, but they can also be strings, byte arrays, etc. IDA all map them in order, calling them a1, a2, a3, … with int type.

During my Googling earlier, I stumbled upon another cool tool on GitHub called Il2CppDumper. This tool also takes libil2cpp.so and global-metadata.dat files as inputs, and builds a dummy C# assembly DLL with the method names. Resharpr freaks out on this DLL though, but .NET Reflector seems to be able to read the definitions just fine without trying to decompile the code (which doesn’t exist in the DLL since it’s just a dummy shell with only the class and methods declarations).

Opening the assembly gives us a nice overview of the classes, most noticeably our HttpClient and Crypto classes:

We now have all the methods signatures and types

A few interesting information are given here. On each method, the Offset address is given. This is the function offset inside the libil2cpp.so binary, and matches what IDA gave us. If we didn’t run unity_decoder earlier, we’d have a function inside IDA called “sub_3ADBA0”. In fact, it would be the Crypto.ComputeChecksum function, since its address is 0x3ADBA0. We can cross-check this with unity_decoder’s mapping:

The method unity_decoder labeled “Crypto__ComputeChecksum” is indeed located at 0x3ADBA0

Another interesting information given by Il2CppDumper’s assembly is the offset of each static field. On many occasions, you’ll end up looking at a line like this in IDA’s pseudocode:

result = *(_DWORD *)(dword_13FE7D8 + 80);

*(_DWORD *)(result + 4) = v4;

This can actually be found in Crypto__cctor function, the constructor of the Crypto class. We can find a very similar line of code in the same function:

**(_DWORD **)(dword_13FE7D8 + 80) = v2

If you’ve looked at the class definition screenshot above from .NET Reflector, you noticed two interesting static fields, _secret and _salt. Noticed also how _secret had no FieldOffset annotation, while _salt had a 0x4 offset? Yup. dword_13FE7D8 is nothing more than the base pointer for the Crypto class instance in memory, 80 is the offset at which the class static fields start, so we can deduct our “v2” variable is what initializes the _secret, and v4 is what initializes the _salt value, since it performs the “+4” offset on it.

I went on, and labeled as many fields and classes as I could. I ended up with the conclusion that the PostAsync just feeds the body bytes to the ComputeHash method, which initializes an array of a certain size, copies a bunch of stuff in there and uses the _secret field from the Crypto class (not the _salt field, suprisingly) as we see another call to the dword_13FE7D8 pointer without offset, do some magic with all this, and then perform the MD5 hash of all those bytes. The result is a string containing the hexadecimal representation of the hash bytes. That’s our Hash header. We’re getting close!

However, this is the first time I’m disassembling a binary like this. And I’m hitting a wall: I can’t figure out where the _secret bytes are initialized from. All I can figure out is that the _secret is a byte array of a certain size, and the _salt field is a byte array of another size.

Two solutions:

Fire up gdbserver on device, and gdb on my host machine, put a breakpoint in the Crypto ctor, and dump the memory at the address of the _secret variable. This can also be done interactively in IDA, but it’s quite heavy to do.

Since we now know the layout of the hashed string, and since the values we need to find are actually small-ish, we can just use Hashcat and brute-force the MD5 hash with a proper mask.

There is likely a way to get the bytes arrays values directly from the libil2cpp.so binary, but I couldn’t find anything online about where to start or how il2cpp stores the static bytes array in the final binary. If anyone has info on that, I’ll be glad to learn and add it here.

A quick test in Hashcat reveals that it will take only a handful of hours to brute-force the hash (on a GTX1080), so I’ll use that method. If the value was larger, the time would have exponentially increased, and the first solution would’ve likely been required.

Note that any obfuscation method on the hash data (XOR, reverse, …) can be deducted from the pseudocode, so it doesn’t have much impact on the reverse engineering complexity.

An hour and 47 minutes later, the good news appeared on the screen:

I have successfully cracked the missing bytes to reach the target hash. This hash is what we had earlier on the simple “{}” JSON message, and I now have all I need to compute my own, valid hashes.

A quick shell script later, I now have a one-liner to generate a valid hash. I replay a simple “time” query to get the server time, and I compare the response header against my script. It works!