In this article I describe my two-months summer internship project at Quarkslab: obfuscating Java bytecode using the [Epona] Code Obfuscator. This article explains our approach, its advantages and limitations.

The goal of these solutions is to run the LLVM bitcode within a JVM, emulating all the missing functionalities, like raw memory management. If we had reused these projects, we would have ended up with a virtual machine (the LLVM one) within a virtual machine (the JVM). We would also have had to take care of the standard Java libraries problem.

It is also interesting to notice that the [Zing] JVM uses LLVM via [Falcon] to compile and optimize the most frequently used pieces of codes at runtime. The paper "Obfuscating Java Programs by Translating Selected Portions of Bytecode to Native Libraries" by Pizzolotto and Ceccato (2019) is the one closest to what we try to achieve. Their goal is to transform the Java code to C code, while using JNI to support more complex behavior. This approach can be seen as some kind of [Cython] for Java. Note that the sources of the tool aren't available.

They are also incomplete, because running Java natively requires a whole translation of all the standard Java libraries. Note that we don't have such problem in our project, as we intend to transform back the LLVM IR to Java bytecode.

These solutions have been developed to run Java programs natively (without going back to Java Bytecode), thus discarding valuable information like Java specific try/catch block.

In this section, we describe these projects and why they don't completely fit with our goal.

Various projects exist around the idea of using both Java and LLVM, mainly falling into two categories:

We will start this blog post by exploring existing solutions. We will then explain how we are going from Java bytecode to LLVM IR and back to Java bytecode. Finally, we will take a look at some optimized and obfuscated examples.

As code obfuscations in Epona are implemented as transformations over the [LLVM] Intermediate Representation , the goal of this internship was to try whether going from Java bytecode to LLVM IR back-and-forth was a viable solution or not, and to identify problems that could arise.

[Epona] provides a C/C++ compiler with opt-in obfuscation features developed by Quarkslab, mainly targeting C and C++. One of our wish is to use the Epona obfuscator on Java, allowing us to implement, maintain, debug and improve the obfuscation techniques without doubling the necessary work.

A typical Java program bytecode is not specifically optimized, this job is better left to the JVM. Because of this, many powerful free Java bytecode decompilation tools can be found online. This means that it is really easy to decompile any non obfuscated Java program. Lots of free and commercial tools also exist to obfuscate Java code, in order to make this decompilation process harder for reverse engineers.

Languages like Java, .Net or OCaml are typically compiled to platform independent bytecode before execution. The bytecode is then interpreted and/or compiled to target dependent machine code at runtime. The Java Bytecode is run by the Java Virtual Machine (JVM) and is generated from Java or other languages targeting it, like Scala or the more recent Kotlin language.

The conversion step

The Java bytecode is a high level bytecode, allowing the distribution of Java programs intended to be run on Java Virtual Machines (JVM). The bytecode cannot manipulate raw memory and is verified before being run to ensure its validity.

The LLVM IR is the Java bytecode counterpart for the LLVM compiler framework. It can be considered as a low level bytecode intended for binaries generation.

Mapping operations and data from Java bytecode to the other is not always direct as there are different trade offs that must be taken into account:

A very detailed translation from Java bytecode to LLVM IR may give more information to the obfuscator, but it may be rather difficult to go back from the LLVM version to the Java bytecode;

On the other hand, a very high level translation would yield a useless LLVM IR in terms of obfuscation, as its components are too abstract to be obfuscated by low level obfuscations.

The challenge is, thus, to find the right level of abstraction. In the following sections we describe how the mapping from Java Bytecode to LLVM IR was performed, beginning by the mapping of Java bytecode scalar and object types to LLVM IR types.

Scalar types There are two types of numbers in the LLVM IR: integer and floating point numbers. iN // N being the number of bits of the integer i1 // A boolean i8 // An octet i32 // A 32bit integer i1942652 // You guessed it The type does not specify if the integer is signed or not, most of integer instructions exist in the signed and unsigned flavor. half // 16 bits floating-point value float // 32 bits floating-point value double // 64 bits floating-point value We will use the following type mapping: Java bytecode LLVM IR boolean i8 byte i8 short i16 char i16 integer i32 long i64 float float double double The boolean type is stored using 8 bits, similarly to how Java Bytecode and the JVM works.

Objects We consider objects as opaque pointers as we don't know (and don't want to know) what's hidden inside (that is JVM-dependent). Note that these mappings, for objects and scalars, are inspired by those found in the [jni.h] file, the file used to write native code using the Java Native Interface (JNI).

Runtime abstraction level We have to find the right amount of abstraction for the translated LLVM bitcode. We consider the Java bytecode to have a high abstraction level and LLVM IR to have a low abstraction level, closer to machine code. The closer we are to machine code, the farther we are from Java bytecode and the more work we will have to do for the Java to LLVM conversion. Moreover, more code could be modified by the optimizer and obfuscator, and we could end-up with an LLVM IR whose semantic would be very hard or impossible to convert back to Java bytecode without some emulation magic. As we previously stated, we want to obfuscate the Java bytecode and try as much as possible not to end up with an LLVM interpreter within the JVM. Having a high abstraction level, while greatly simplifying the conversion, is useless if the optimizer and obfuscator can't understand what the code is doing. We might as well put the whole Java bytecode in metadata. For example, LLVM and Java arrays are very different. A Java array behaves much like an object, using reference, and its destruction being handled by the garbage collector. It does not make sense to use LLVM arrays so we added an abstraction. Arrays are now created and manipulated with methods taking the array pointer as an argument, much more like the way they work with Java bytecode. Calls We are using a variety of abstract methods to convert Java bytecode instructions that can't be directly translated to LLVM IR, like calls to other Java methods. These calls are converted using a specific convention: all the necessary information is encoded in the called function name, and the call parameters are the same as their bytecode counterparts. This makes it easy to convert back to bytecode while still allowing the optimizer and obfuscator to do their job. For example a bytecode INVOKESPECIAL instruction will be translated to call void @"Java_@invokespecial@java/lang/Object@<init>@()V"(i64* %1) . Here calling the constructor of the target object java.lang.Object without any argument ( i64* %1 is the pointer referencing this object) We provide here a simple Java example, its corresponding Java bytecode and the generated LLVM IR: AtomicInteger ai = new AtomicInteger ( 4 ); Math . pow ( ai . get (), 2 ); NEW Ljava/util/concurrent/atomic/AtomicInteger; DUP BIPUSH 4 INVOKESPECIAL java/util/concurrent/atomic/AtomicInteger.<init>:(I)V INVOKEVIRTUAL java/util/concurrent/atomic/AtomicInteger.incrementAndGet:()I I2D LDC 2D INVOKESTATIC java/lang/Math.pow:(DD)D D2I IRETURN %1 = call i64 * @"Java_@new@java/util/concurrent/atomic/AtomicInteger" () call void @"Java_@invokespecial@java/util/concurrent/atomic/AtomicInteger@<init>@(I)V" ( i64 * %1 , i32 4 ) %2 = call i32 @"Java_@invokevirtual@java/util/concurrent/atomic/AtomicInteger@get@()I" ( i64 * %1 ) %3 = call double @"Java_@invokestatic@java/lang/Math@pow@(DD)D" ( i32 %2 , i32 2 ) The same technique is also used for representing things like this which is ALOAD 0 in a non static Java methods to be used for super class instantiation. %1 = call i64 * @Java_fixed_this () call void @"Java_@invokespecial@com/quarkslab/java2llvm/testfiles/TestFile@<init>@()V" ( i64 * %1 ) Calls are also extensively used for Java array manipulation: %1 = call i64 * @Java_fixed_array_create_10 ( i32 1 ) call void @Java_fixed_array_setIntCellData ( %1 , i32 0 , i32 5 ) call i32 @Java_fixed_array_getIntCellData ( %1 , i32 0 ) With this abstraction, converting back to Java bytecode is natural, as most information required for the translation is available in the function's name. For example the array type is specified in the array create function (the opcode 10 is for integer) without having to find the first value assignment to get the array type from. The drawback of this approach is that we suppose that all the users of these abstract functions will be calls, and they won't end up (for instance) in an array of function pointers. This is something for instance an obfuscator could generate. This means that we need to make it aware of the special semantics of these functions. Representation of the JVM stack Inspired from this paper about translating bytecode to native libraries , we came up with a very simple way to convert the Java stack to LLVM registers. We began by writing functions to add elements and pop elements from the stack. The stack is a simple array defined with a given length (given in the bytecode) and with an index to track where we are. We then call out special functions each time we need to interact with the stack. These functions are inlined in the final LLVM IR, and the optimizations (Scalar Replacement Of Aggregates [https://llvm.org/docs/Passes.html#sroa-scalar-replacement-of-aggregates] being the most important) completely remove the stack array. The same technique is used for store and load operations, without the need for the index to keep track of where we are on the stack. Note: The functions are generated at build time from C, simply because it is easier to write and to understand than the LLVM IR.

Java to LLVM Converting from Java bytecode to LLVM IR is the easy part. Most of the bytecode instructions can be directly translated into their LLVM IR counterparts, and the stack machine is easy to emulate thanks to a simple array and index, as seen previously. We are converting each class file as an individual LLVM module. Because the JVM is a stack machine, there is a lot of stack-based instructions. For example the IADD bytecode instruction pops two 32 bit integers from the stack, adds them together and pushes the result on the stack. This instruction is converted to: %1 = call i32 @Java_popInt ( i64 * %stackPointer , i64 * %stackIndex ) %2 = call i32 @Java_popInt ( i64 * %stackPointer , i64 * %stackIndex ) %3 = add i32 %1 , %2 call void @Java_pushInt ( i64 * %stackPointer , i64 * %stackIndex , i32 %3 ) The stackPointer and stackIndex variables are values allocated at the beginning of the translated LLVM function. The maximum size of the stack is given in the original Java class file. Here is an example: %stack = alloca [ 2 x i64 ] %stackIndex = alloca i64 store i64 0 , i64 * %stackIndex %stackPointer = getelementptr inbounds [ 2 x i64 ], [ 2 x i64 ]* %stack , i32 0 , i32 0 %locals = alloca [ 3 x i64 ] %localsPointer = getelementptr inbounds [ 3 x i64 ], [ 3 x i64 ]* %locals , i32 0 , i32 0 As stated above, we are emulating a stack that will be removed by later optimizations of the LLVM bitcode. Fox example, after conversion the following method: public int test ( int a1 , int a2 ) { a1 = - a1 ; a1 = a1 << 1 ; a2 |= 5 ; a1 &= 15 ; a1 = ~ a1 ^ 20 ; return a1 + a2 ; } gives out 105 LLVM instructions, which are then optimized to the following LLVM IR: define i32 @"test@(II)I@1" ( i32 , i32 ) local_unnamed_addr { lb_434176574_-1874797944: %"7" = shl i32 %0 , 1 %"12" = sub i32 0 , %"7" %"27" = or i32 %1 , 5 %"32" = and i32 %"12" , 14 %"40" = xor i32 %"32" , -21 %"56" = add nsw i32 %"40" , %"27" ret i32 %"56" } Here you can see that the expression with a NOT followed by a XOR with 20 has been replaced by a XOR with -21, which is ~20 (on 32 bits, signed representation). Control flow The control flow is also easily converted and optimized: public int test ( int a1 , int a2 ) { return a1 > a2 ? a2 : a1 ; } Gives: define i32 @"test@(II)I@1" ( i32 , i32 ) local_unnamed_addr { lb_529116035_608112264: %"3" = tail call i64 * @Java_fixed_this () %"9" = icmp sgt i32 %0 , %1 %. = select i1 %"9" , i32 %1 , i32 %0 ret i32 %. } We can see that the implicit if instruction and the two possible result basic blocks have been combined into one using a select instruction. The following example contains a loop: public int test ( int a1 , int a2 ) { int j = 0 ; for ( int i = 0 ; i < a1 ; i ++) { j += a2 ; } return j ; } In the generated LLVM IR, it is optimized to a simple multiplication with a select for negative values: define i32 @"test@(II)I@1" ( i32 , i32 ) local_unnamed_addr { lb_1433867275_-465532616: %"3" = tail call i64 * @Java_fixed_this () %"1121" = icmp sgt i32 %0 , 0 %2 = mul i32 %1 , %0 %spec.select = select i1 %"1121" , i32 %2 , i32 0 ret i32 %spec.select } The next one: public int test ( int a1 , int a2 ) { for ( int i = 0 ; i < a2 ; i ++) { System . out . println (); } return 0 ; } gives out: define i32 @"test@(II)I@1" ( i32 , i32 ) local_unnamed_addr { lb_start: %"17.reg2mem" = alloca i32 %"3" = tail call i64 * @Java_fixed_this () %"1016" = icmp sgt i32 %1 , 0 br i1 %"1016" , label %lb_preloop , label %lb_return lb_preloop: store i32 0 , i32 * %"17.reg2mem" br label %lb_loop lb_loop: %locals.sroa.4.017.reload = load i32 , i32 * %"17.reg2mem" %"11" = tail call i64 * @"Java_@getstatic@java/lang/System@out@Ljava/io/PrintStream;" () tail call void @"Java_@invokevirtual@java/io/PrintStream@println@()V" ( i64 * %"11" ) %"17" = add nuw nsw i32 %locals.sroa.4.017.reload , 1 store i32 %"17" , i32 * %"17.reg2mem" %exitcond = icmp eq i32 %"17" , %1 br i1 %exitcond , label %lb_return , label %lb_loop lb_return: ret i32 0 } This loop stays as a loop because of the abstract Java functions. It would have been unrolled if it was possible. Our last control flow example contains a switch: public int test ( int a1 , int a2 ) { switch ( a1 ) { case 0 : return 1 ; case 1 : return 5 ; case 20 : return 3 ; case 15 : case 17 : return 666 ; } return 0 ; } resulting in the following IR: define i32 @"test@(II)I@1" ( i32 , i32 ) local_unnamed_addr { lb_791885625_-801122440: %merge.reg2mem = alloca i32 %"3" = tail call i64 * @Java_fixed_this () switch i32 %0 , label %lb_2054881392_-801122440 [ i32 0 , label %lb_791885625_-801122440.lb_1887400018_-801122440_crit_edge i32 1 , label %lb_2001112025_-801122440 i32 15 , label %lb_791885625_-801122440.lb_1288141870_-801122440_crit_edge i32 17 , label %lb_791885625_-801122440.lb_1288141870_-801122440_crit_edge12 i32 20 , label %lb_314265080_-801122440 ] ... Fields A field is a variable inside a class. It can be accessed by any method in the class and sometimes from outside the class. We translate them as LLVM global variables. When translated back to Java bytecode, these globals will be converted back to fields. This code: private int test ; public int test ( int a1 , int a2 ) { test = 9 ; test += a1 ; test *= a2 ; return test ; } public int getTest () { return test ; } Gives out: @"Java_@test@2@I" = external local_unnamed_addr global i32 define i32 @"test@(II)I@1" ( i32 , i32 ) local_unnamed_addr { lb_48612937_-532753336: %"16" = add i32 %0 , 9 %"25" = mul i32 %"16" , %1 store i32 %"25" , i32 * @"Java_@test@2@I" , align 4 ; A store to the variable representing the test field ret i32 %"25" } define i32 @"getTest@()I@1" () local_unnamed_addr { lb_1618212626_-532735736: %"6" = load i32 , i32 * @"Java_@test@2@I" , align 4 ; A load from the variable representing the test field ret i32 %"6" } Exceptions Exceptions, on the contrary, are difficult to translate to LLVM IR. There is an exception system in LLVM with throw, try/catch, cleanup pads, but it doesn't behave like the Java exception system and a lot of information that would need to be forwarded back to Java would have been lost. We need to keep in mind that LLVM is not intended to be converted to Java Bytecode. We needed to implement our own exception system. This was done with functions representing try/catch blocks to be sure that the optimizations or obfuscations wouldn't mess them up. The exception type is encoded in metadata. For example the following code: public void test () { try { ...; } catch ( RuntimeException e ) { e . printStackTrace (); } } is translated to the following IR, where we can see two added functions, one for the try block and one for the catch block: define i32 @"test@()V@1" () { ... %"6" = call i1 @Java_fixed_exception_try_1433867275 ( i64 * %localsPointer ) br i1 %"6" , label %lb_itsFine , label %lb_haltAndCatchFire lb_haltAndCatchFire: call void @Java_fixed_exception_catch_1433867275 ( i64 * %localsPointer ) br label %lb_itsFine lb_itsFine: ret void } define i1 @Java_fixed_exception_try_1433867275 ( i64 *) { ; The function for the try part of the try/catch %9 = tail call i1 @Java_fixed_exception_result_1433867275 () ret i1 %9 } define void @Java_fixed_exception_catch_1433867275 ( i64 *) { : The function f or the catch part of the try/ catch %"13" = tail call i64 * @Java_fixed_exception_push () tail call void @"Java_@invokevirtual@java/lang/RuntimeException@printStackTrace@()V" ( i64 * %"13" ) ret void } So as long as each of the behavior of theses functions is the same, at a function level, before and after the transformations, everything is fine. In particular, we prevent the inlining of these functions, so that we can easily convert this scheme back to Java bytecode. This is the simplest example. This system works fine for simple exceptions, exceptions with multiple catch clauses, exceptions in try or catch blocks. Problems arise when there is a jump out of the try/catch function, possibly in the middle of another one. This had to be accommodated for with switches and it made the exception conversion system quite complex. On the other hand, translating exceptions back to Java bytecode is simple because with well defined names, finding the matching catch function for a try function is trivial.