Cross-Referencing stand-alone Dalvik Bytecode

A few days ago Patrick Schulz from BlueBox Security

posted an Android Challenge on BlueBox’ blog. In this blogpost we will

not go into the entire challenge, but rather focus on the patched bytecode.

Shameless self-promotion: tweet, reddit (I didn’t even have to make

the reddit post, hah.)

Introduction

After reading the blogpost, including the spoiler, it’s evident that the

native library will patch the bytecode of a particular function that was

originally implemented in classes.dex (the container which keeps all

dalvik bytecode with metadata.)

As part of research I’m doing for my presentation at AthCon I found

the patching process interesting in particular. This is actually a technique

I’ve thought about earlier, but then again, I’m sure many people have

Just-in-Time Bytecode

In order to speed up the process of executing the Dalvik bytecode, Android has

a Just in Time compiler, which may compile certain functions into native ARMv7

instructions. This allows the virtual machine to execute faster compared to

interpreting the bytecode naively.

I do not know the following for sure, as it depends on the internals of the

dalvik JIT, but it may require the bytecode to be patched before executing

it. If we were to patch the bytecode after it has been compiled by the JIT,

then who’s going to execute it? (This is just a side-note for anyone looking

to do the same in the future.)

Locating the Bytecode

Opening up the native library that can be found inside the apk, we find

ourselves with various functions dealing with the Dex file format.

After looking through the functions for a minute or two, we get to a

mprotect() call followed by a memcpy() call, this is where the function is

being patched, as described in the spoiler by the Patrick’s blogpost.

Extracting the Bytecode

I loaded the native library in IDA Pro. It appears that the symbols were not

stripped, so that makes it easier for us as well. Anyway, when the relevant

memcpy function is found, we see an obvious inject_ptr. Which is a pointer

to the target bytecode. We extract the few hundred bytes of bytecode directly

from the binary, as it’s not encrypted or anything, and put the hexdump in a

file. (Use the Hex View in IDA Pro.)

We then translate the hex dump into a binary file using the following command.

$ xxd -r -p hexfile binfile

Analyzing the Bytecode

Now we’ve got raw bytecode. It appears that dexdump doesn’t really know what

to do with it. We can do two things now. The first option would be to create

a new .dex file with the patched function (i.e., patching the original

.dex file with the new bytecode), but I don’t feel like doing that.

The second option would be to disassemble the raw bytecode and to fix all

cross-references to other methods, strings, etc. So that’s what I did.

For simplicity sake I wrote some Python bindings for my dalvik disassembler,

which I made a few months ago and is unfortunately still not public.

Disassembling the raw bytecode then results in the following output.

$ python dalvik.py binfile 0 const/4 v2, #+0 2 invoke-virtual {v12}, meth@3103 8 move-result v4 10 new-instance v5, type@475 14 invoke-direct {v5}, meth@3149 [...]

However, as you may notice, we’re missing some information here. So I also

wrote some Python bindings around my Dex file parser, which is still private,

just like the dalvik disassembler. The references in the bytecode, i.e.,

meth@3103 etc., are references to the original .dex file, so I dumped all

the relevant tables from the original .dex file into a simple database file

(actually just a pickle‘d dictionary, to make life easy.)

Having a database with all lookup tables we can now continue onto the

disassembling part. When disassembling a dalvik instruction, the disassembler

also returns whether there’s a lookup and in which table this lookup is.

Printing the correct information next to the instruction is therefore as easy

as reading from the correct table with the correct index. This looks like the

following in the disassembler code.

length, d = disasm(...) if d.kind is None: print offset, d.string else: print offset, d.string, ';', c[d.kind][d.index]

Disassembling again, with the database file as parameter, we get the following

output.

$ python dalvik.py -c bindb binfile 0 const/4 v2, #+0 2 invoke-virtual {v12}, meth@3103 ; ()I Ljava/lang/String; length 8 move-result v4 10 new-instance v5, type@475 ; Ljava/util/HashMap; 14 invoke-direct {v5}, meth@3149 ; ()V Ljava/util/HashMap; <init> [...]

Now I thought that was pretty cool, so..

Challenge Spoiler

For those of you that would like to do the challenge without having to write

several hundreds if not thousands of lines of code and/or without directly

patching the binary, the complete output of the bytecode can be found

here.

Small note: it appears my disassembler doesn’t really understand signed

shorts at the moment, but that’ll be fixed another time.

Toolz

I will release all the tools after my AthCon presentation. In the meantime,

I’ll be working on extending the code to do lots of other cool stuff with it

\o/