

Author: “No Bugs” Hare Follow: Job Title: Sarcastic Architect Hobbies: Thinking Aloud, Arguing with Managers, Annoying HRs,

Calling a Spade a Spade, Keeping Tongue in Cheek

[rabbit_ddmog vol=”8″ chap=”Chapter 29(h) from “beta” Volume VIII”]

What a fiendish cipher it must be, to talk about it left, right, and center! — from Cyberiad by Stanislaw Lem —

With all those considerations which we discussed over the course of last 2 posts, an obvious question of “how to use it in practice” arises. To illustrate those points I’ve made, I wrote a pure C++ template-based obfuscation library, and placed it to Github [NoBugs]. Note that it is a pure C++ solution, i.e. without an external code generator; while abilities of the pure C++ are limited compared to the code generator-based solutions, I was surprised how much it is possible to achieve while staying entirely within bounds of C++.1

DISCLAIMER: as of now, library is in pre-alpha stage; while I am reasonably confident that if it compiles, it will produce consistent results (i.e. output with obfuscation will be the same as output without obfuscation), chances of it failing to compile in certain use cases are still pretty high <sad-face />. Still, as a proof-of-concept it IS interesting to see the results which can be obtained even with a pure in-language solution.

Basic Idea

As it was discussed above –

I am a Big Fan™ of automated randomized code generation. 2

In other words – the idea is NOT to write obfuscations manually. Instead – we should just declare which-things-we-want-to-obfuscate, and then obfuscation library should take care of the rest.

Example: factorial() function

To illustrate “how ithare::obf can be used to obfuscate things”, we’ll use the following very simplistic factorial() function:

//Listing 29.factorial.orig __declspec(noinline) int64_t factorial(int64_t x) { if(x<0) throw MyException(“Negative value of factorial!”); int64_t ret = 1; for(int64_t i=1; i <= x; ++i) { ret *= i; } return ret; }

Note: above, __declspec(noinline) is used merely to isolate our factorial() function from the rest of code, to see what is going on more clearly. With a few very narrow exceptions, noinline SHOULD NOT be used for production code (leave alone obfuscated production code).

Baseline

After compiling the code above in Release x64 mode, we get the following assembly (as shown by IDA Pro; other disassemblers may provide less information, but we should be ready to the attacker using the-best-available-tools):

;;Listing 29.factorial.orig.asm ;;single-; indicates comments by IDA Pro ;;double-;; indicate my own comments var_38 = byte ptr -38h;;offset identifier created by IDA Pro sub rsp,58h mov rdx,rcx test rcx,rcx js l1 mov eax,1 mov ecx,eax cmp rax,rdx jg l2 nop dword ptr[rax+rax+00000000h];;aligning instructions l3: imul rax,rcx inc rcx cmp rcx,rdx jle l3 l2: add rsp,58h retn l1: lea rcx,[rsp+58h+var_38] call sub_2 lea rdx,[__TI1?AVMyException@@]; throw info ; for ‘class MyException’ lea rcx,[rsp+58h+var_38] call _CxxThrowException sub_2: push rbx sub rsp, 20h lea rax,off_1400036F90 mov rbx, rcx mov [rcx], rax lea rdx, aNegativeArgume ; “Negative argument to factorial!” add rcx, 8 call sub_3 mov rax, rbx add rsp, 20h pop rbx retn

It is very short, and is rather clear even in asm. NB: as we can see, MyException name is visible in the .exe – that’s in spite of me disabling RTTI (RTTI seems to be enabled at least for classes-being-thrown regardless of /GR- option being fed to MSVC compiler <sad-face />).

Even better for the potential attacker (and worse for us), is that if we feed this code to the IDA Pro decompiler, our intentions will be even more obvious:

//Listing 29.factorial.obf.decompiled.c signed __int64 __fastcall sub_1(signed __int64 a1) { signed __int64 v1;//rdx signed __int64 result;//rax signed __int64 i;//rcx char v4;// [rsp+20h] [rbp-38h] v1 = a1; if(a1<0) { sub_2(&v4); CxxThrowException(&v4, &_TI1_AVMyException__); } result = 1i64; for( i = 1i64; i <= v1; ++i ) result *= i; return result; } _QWORD *__fastcall sub_2(_QWORD *a1) { _QWORD *v1;// rbx v1 = a1; *a1 = &off_140036F90; sub_3(a1 + 1, “Negative argument to factorial!”); return v1; }

Aren’t you amazed at capabilities of IDA’s reverse engineering? I certainly am; it indeed looks extremely close to the original code, and our intentions are indeed very clear. Even if we wouldn’t give away that obvious clue with our “Negative argument…” error string – the attacker still won’t have much problems realizing that our sub_1() is a function which calculates factorial (that for(i) loop in sub_1() gives it away almost-instantly).

This is the point where our obfuscation task starts…

Obfuscated-but-Still-Readable-and-Debuggable?

As it was stated above, our goal is to obfuscate the binary code, while keeping our source code as-readable-as-possible. Let’s see how ithare::obf from library [NoBugs] deals with it.

To enable automated randomized obfuscation, we have to say which variables and which literals we want to obfuscate, and to what extent (leaving generation of the actual obfuscation code to the library and compiler). Let me show it on example of our factorial() function:

//Listing 29.factorial.obf __declspec(noinline) OBF6(int64_t) factorial(OBF6(int64_t) x) { if(x<0) throw MyException(OBF5S(“Negative value of factorial!”)); OBF3(int64_t) ret = 1;//we could obfuscate literal '1' too, // but for such common literals it rarely makes sense for(OBF3(int64_t) i=1; i <= x; ++i) { ret *= i; } return ret; }

That’s it! As you can see, we didn’t do any real obfuscation here, merely declaring which variables have to be obfuscated (in fact, we’re replacing types-to-be-obfuscated with their obfuscated counterparts).

About those digits within OBF6() and OBF3(). In ithare::obf there is a convention that this-digit-after-OBF means “how much we want to obfuscate this type/literal”: OBF0() means “obfuscate using no more than 1 CPU cycle”, OBF1() – “using no more than 3 CPU cycles”, OBF2() – “no more than 10 CPU cycles”, and so on. More generally, OBFX() means “obfuscate using no more than 10^(X/2) CPU cycles”, so OBF6() is allowed to eat up to 1000 CPU cycles.

Obfuscating Parameters and Return Values

In our example above, we obfuscated not only internal variables, but also input parameters and return values of our factorial() function (and moreover, we used MUCH more severe obfuscation on entry and on exit, than inside). The reason for it is that

We want to obfuscate not only the nature of the function, but also its interfaces.

This provides two very significant benefits:

When debugging, attacker cannot really see real values of the function parameters (he will see only those obfuscated values). As call stack is one of major tools used by attackers, making it less obvious (and we’ll be making it much less obvious – see decompiled examples below) certainly qualifies a Good Thing™.

“ Even if the attacker knows what our function does, to invoke the function properly he still needs to deobfuscate the API 3

Debugging

As we can observe, our source code is not too obfuscated. Moreover, if we compile it for debugging (i.e. without ITHARE_OBF_SEED being defined) – it will stay debuggable too; here is a screenshot from MSVC with our factorial() function being debugged:

As we can see, in this mode (without ITHARE_OBF_SEED), all the variables look more or less as their counterparts (that added ‘val=’ to the value is not TOO bad debugging-wise). Of course, we won’t be debugging our obfuscation this way – but when debugging our app-level logic, we have to assume that whatever-obfuscation-we-have works flawlessly anyway.4

Compile for Production: Enter ITHARE_OBF_SEED

Now, let’s try to compile the very same Listing 29.factorial.obf for production; to do it – we have to recompile exactly the same source code but with ITHARE_OBF_SEED #defined to be a large random number.5

At this point, we get an executable, which is functionally equivalent to the original one – but is randomly and severely obfuscated. Let’s see what current (as of January 2018) version of the ithare::obf will do with our factorial() function.

From assembler, the following can be observed:

Our “Negative value of factorial!” literal is no longer easily visible (it doesn’t show in “Strings” view of IDA Pro, and is not really easily detectable until the moment we decide to use it).

Our simplistic factorial() function now takes whopping 2 kilobytes(!).

As it is sooooo large (and is not-so-easy-to-comprehend) – I will provide only a portion of it (specifically the inner loop) here:

;;Listing 29.factorial.obfuscated.asm.InnerLoopOnly loc_140002430: mov r9, r15 mov r10d, 1Fh sub r9, rax imul rax, rcx, 1Fh mov r8d, r9d sub r10, rax and r8d, 0FFFF0000h movzx eax, r9w add r8d, eax mov eax, r9d sub r8, rax add r8, r9 imul r10, r8 mov edx, r10d movzx eax, r10w and edx, 0FFFF0000h add edx, eax mov eax, r10d sub rax, rdx mov rdx, cs:qword_14003FAA0 sub rax, r10 add rax, r15 movzx r8d, byte ptr [rdx+2] imul r8, -1Fh imul rdx, rcx, -1Fh add r8, rbx mov rcx, r8 mov r8d, 1Fh inc rdx imul rcx, rdx imul rdx, rcx, 1Fh sub r8, rdx cmp r8, r11 jle short loc_140002430

As we can see – it is far from being obvious; while we do have an imul operation within – it doesn’t give up much, as actually we have six of such imuls, and it isn’t obvious whether they’re part of obfuscation, or not.

And even if we look at the our obfuscated factorial() function through the prism of almighty IDA Pro’s decompiler – all we get is the following:

//Listing 29.factorial.obf.decompiled.c _DWORD *__fastcall sub_1(_DWORD *a1, __int64 a2) { int v2; // er14 __int16 v3; // ax int v4; // er9 int v5; // er11 __int16 v6; // r10 int v7; // er13 int v8; // eax unsigned __int64 v9; // r8 unsigned __int64 v10; // rax signed __int64 v11; // rcx int v12; // er8 int v13; // er11 int v14; // edi int v15; // edi int v16; // er11 unsigned __int64 v17; // r11 int v18; // er10 unsigned int v19; // ebx int v20; // edi int v21; // er11 unsigned __int64 v22; // rsi unsigned __int64 v23; // r10 int v24; // er9 unsigned int v25; // edi int v26; // ecx int v27; // er10 int v28; // er9 int v29; // er11 unsigned __int64 v30; // rdx _DWORD *result; // rax unsigned __int64 v32; // rdx __int16 v33; // ax int v34; // er8 __int64 v35; // r8 _DWORD *v36; // [rsp+48h] [rbp-51h] char v37; // [rsp+50h] [rbp-49h] char v38; // [rsp+54h] [rbp-45h] char v39; // [rsp+58h] [rbp-41h] char v40; // [rsp+5Ch] [rbp-3Dh] char v41; // [rsp+60h] [rbp-39h] __int64 v42; // [rsp+70h] [rbp-29h] __int64 v43; // [rsp+78h] [rbp-21h] char v44; // [rsp+80h] [rbp-19h] unsigned int v45; // [rsp+A8h] [rbp+Fh] int v46; // [rsp+ACh] [rbp+13h] unsigned int v47; // [rsp+B0h] [rbp+17h] int v48; // [rsp+B4h] [rbp+1Bh] int v49; // [rsp+B8h] [rbp+1Fh] int v50; // [rsp+BCh] [rbp+23h] unsigned int v51; // [rsp+C0h] [rbp+27h] unsigned int v52; // [rsp+C4h] [rbp+2Bh] v36 = a1; v2 = -256 * BYTE5(a2) - ((unsigned __int16)(15 - HIWORD(a2)) << 16) - BYTE4(a2) + 31; v3 = (unsigned __int64)((unsigned int)(unsigned __int16)(HIWORD(v2) - (-256 * BYTE5(a2) - BYTE4(a2) + 31) * (-256 * BYTE5(a2) - BYTE4(a2) + 31)) - 15 + (v2 << 16)) >> 16; v4 = (((unsigned __int16)(HIWORD(v2) - (-256 * BYTE5(a2) - BYTE4(a2) + 31) * (-256 * BYTE5(a2) - BYTE4(a2) + 31)) - 15 + (v2 << 16)) << 16) + (unsigned __int16)(v3 - (HIWORD(v2) - (-256 * BYTE5(a2) - BYTE4(a2) + 31) * (-256 * BYTE5(a2) - BYTE4(a2) + 31) - 15) * (HIWORD(v2) - (-256 * BYTE5(a2) - BYTE4(a2) + 31) * (-256 * BYTE5(a2) - BYTE4(a2) + 31) - 15)); v5 = (unsigned __int16)((HIWORD(v4) << 8) + (((HIWORD(v4) >> 8) - HIWORD(v4)) & 0xFF)) + ((((unsigned __int8)(3 - 15 * ((unsigned __int16)(v3 - (HIWORD(v2) - (-256 * BYTE5(a2) - BYTE4(a2) + 31) * (-256 * BYTE5(a2) - BYTE4(a2) + 31) - 15) * (HIWORD(v2) - (-256 * BYTE5(a2) - BYTE4(a2) + 31) * (-256 * BYTE5(a2) - BYTE4(a2) + 31) - 15)) >> 8)) << 8) + 65521 + (unsigned __int8)(v3 - (BYTE2(v2) - (31 - BYTE4(a2)) * (31 - BYTE4(a2)) - 15) * (BYTE2(v2) - (31 - BYTE4(a2)) * (31 - BYTE4(a2)) - 15))) << 16); v6 = (unsigned __int16)(31 - ((unsigned __int8)-HIBYTE(v5) << 8) - (unsigned __int8)(3 - BYTE2(v5))) >> 8; v7 = ((_DWORD)a2 << 16) + (unsigned __int16)(WORD1(a2) - a2); v8 = (unsigned __int64)(v7 + (unsigned __int16)(WORD1(a2) - a2 - 3) - (unsigned int)(unsigned __int16)(WORD1(a2) - a2)) >> 16; v9 = 3i64 - 3 * (((v7 + (unsigned __int16)(WORD1(a2) - a2 - 3) - (unsigned __int16)(WORD1(a2) - a2)) << 16) + (unsigned int)(unsigned __int16)(v8 - (WORD1(a2) - a2 - 3) * (WORD1(a2) - a2 - 3))) - ((unsigned __int64)(3 * ((unsigned __int16)((unsigned __int8)(3 - v6) + (((unsigned __int8)(1 - (31 - (3 - BYTE2(v5)))) + ((31 - (_WORD)v5) << 8)) << 8)) - (unsigned __int16)(15 - ((unsigned __int8)(3 - v6) + (((unsigned __int8)(1 - (31 - (3 - BYTE2(v5)))) + ((31 - (_WORD)v5) << 8)) << 8))) - ((unsigned __int8)(3 - v6) + (((unsigned __int8)(1 - (31 - (3 - BYTE2(v5)))) + ((unsigned int)(unsigned __int16)(31 - ((HIWORD(v4) << 8) + (((HIWORD(v4) >> 8) - HIWORD(v4)) & 0xFF))) << 8)) << 8)) + 5)) << 32); if ( (signed __int64)((v9 << 32) + HIDWORD(v9) - (3 - 3 * (((v7 + (unsigned __int16)(WORD1(a2) - a2 - 3) - (unsigned __int16)(WORD1(a2) - a2)) << 16) + (unsigned __int16)(v8 - (WORD1(a2) - a2 - 3) * (WORD1(a2) - a2 - 3)))) * (3 - 3 * (((v7 + (unsigned __int16)(WORD1(a2) - a2 - 3) - (unsigned __int16)(WORD1(a2) - a2)) << 16) + (unsigned int)(unsigned __int16)(v8 - (WORD1(a2) - a2 - 3) * (WORD1(a2) - a2 - 3))))) < 0 ) { v45 = 983040 * (((unsigned int)(3 * dword_14003DB80) >> 16) + ((unsigned __int16)(3 * dword_14003DB80) << 16)) + (unsigned __int16)((15 * (((unsigned int)(3 * dword_14003DB80) >> 16) + ((unsigned __int16)(3 * dword_14003DB80) << 16)) >> 16) - abs(15 * ((unsigned int)(3 * dword_14003DB80) >> 16))); v46 = 31 * (((unsigned __int16)-(signed __int16)(((unsigned __int16)-(signed __int16)dword_14003DB84 >> 8) + ((unsigned __int8)-(char)dword_14003DB84 << 8)) << 16) + (unsigned __int16)-((-65536 * (unsigned __int16)(3 - HIWORD(dword_14003DB84)) - (unsigned int)(unsigned __int16)(((unsigned __int16)-(signed __int16)dword_14003DB84 >> 8) + ((unsigned __int8)-(char)dword_14003DB84 << 8)) + 1) >> 16)); v33 = (unsigned __int64)(unsigned int)-dword_14003DB88 >> 16; v47 = 15 * (((-65536 * dword_14003DB88 + (unsigned int)(unsigned __int16)(v33 - -(signed __int16)dword_14003DB88 * -(signed __int16)dword_14003DB88)) >> 16) + ((unsigned __int16)(-(signed __int16)dword_14003DB88 * -(signed __int16)dword_14003DB88 - v33) << 16)); v34 = ((((unsigned __int16)-HIWORD(dword_14003DB8C) << 16) + (unsigned __int16)(15 * ((unsigned __int8)dword_14003DB8C + ((unsigned __int8)(1 - BYTE1(dword_14003DB8C)) << 8)))) << 16) + (unsigned __int16)(((((unsigned __int16)-HIWORD(dword_14003DB8C) << 16) + (unsigned int)(unsigned __int16)(15 * ((unsigned __int8)dword_14003DB8C + ((unsigned __int8)(1 - BYTE1(dword_14003DB8C)) << 8)))) >> 16) - abs( 15 * ((unsigned __int8)dword_14003DB8C + ((unsigned __int8)(1 - BYTE1(dword_14003DB8C)) << 8)))); v48 = (v34 << 16) + (unsigned __int16)(HIWORD(v34) - abs( ((((unsigned __int16)-HIWORD(dword_14003DB8C) << 16) + (unsigned int)(unsigned __int16)(15 * ((unsigned __int8)dword_14003DB8C + ((unsigned __int8)(1 - BYTE1(dword_14003DB8C)) << 8)))) >> 16) - abs( 15 * ((unsigned __int8)dword_14003DB8C + ((unsigned __int8)(1 - BYTE1(dword_14003DB8C)) << 8))))); v49 = -65536 * (unsigned __int16)(4 - HIWORD(dword_14003DB90)) - (unsigned __int16)(((unsigned __int16)(45 - 15 * dword_14003DB90) >> 8) + ((unsigned __int8)(45 - 15 * dword_14003DB90 - 15) << 8)) + 1; v42 = 0i64; v43 = 15i64; v41 = 0; v35 = (((unsigned __int16)-HIWORD(dword_14003DB94) << 16) + (unsigned int)(unsigned __int16)(15 * ((unsigned __int8)dword_14003DB94 + ((unsigned __int8)(1 - BYTE1(dword_14003DB94)) << 8)))) << 16; v50 = v35 + (unsigned __int16)(((((unsigned __int16)-HIWORD(dword_14003DB94) << 16) + (unsigned int)(unsigned __int16)(15 * ((unsigned __int8)dword_14003DB94 + ((unsigned __int8)(1 - BYTE1(dword_14003DB94)) << 8)))) >> 16) - abs( 15 * ((unsigned __int8)dword_14003DB94 + ((unsigned __int8)(1 - BYTE1(dword_14003DB94)) << 8)))); v51 = -15 * (unsigned __int16)qword_14003DB98 - 983040 * ((unsigned int)qword_14003DB98 >> 16); v52 = (HIDWORD(qword_14003DB98) & 0xFFFF0000) + (unsigned __int16)(45 - 15 * WORD2(qword_14003DB98)); sub_2( &v41, 31i64, v35, &v45, __PAIR__(dword_14003DB94, dword_14003DB8C), __PAIR__(dword_14003DB90, dword_14003DB84), qword_14003DB98); sub_3(&v44, &v41); CxxThrowException(&v44, &_TI1_AVMyException__); } v10 = 14i64; v11 = 930i64 * *(unsigned __int8 *)(qword_14003FAA0 + 2) + 1190112520884487202i64; v12 = (unsigned __int64)((unsigned int)(unsigned __int16)(HIWORD(v2) - v2 * v2) - 15 + (v2 << 16)) >> 16; v13 = (((unsigned __int16)(HIWORD(v2) - v2 * v2) - 15 + (v2 << 16)) << 16) + (unsigned __int16)(v12 - (HIWORD(v2) - v2 * v2 - 15) * (HIWORD(v2) - v2 * v2 - 15)); v14 = (unsigned __int16)((HIWORD(v13) << 8) + (((HIWORD(v13) >> 8) - HIWORD(v13)) & 0xFF)) + ((((unsigned __int8)(3 - 15 * ((unsigned __int16)(v12 - (HIWORD(v2) - v2 * v2 - 15) * (HIWORD(v2) - v2 * v2 - 15)) >> 8)) << 8) + 65521 + (unsigned __int8)(v12 - (BYTE2(v2) - v2 * v2 - 15) * (BYTE2(v2) - v2 * v2 - 15))) << 16); v15 = (unsigned __int8)(3 - ((unsigned __int16)(31 - ((unsigned __int8)-HIBYTE(v14) << 8) - (unsigned __int8)(3 - BYTE2(v14))) >> 8)) + (((unsigned __int8)(1 - (31 - (3 - BYTE2(v14)))) + ((unsigned __int16)(31 - v14) << 8)) << 8); v16 = ((_DWORD)a2 << 16) + (unsigned __int16)(WORD1(a2) - a2) + (unsigned __int16)(WORD1(a2) - a2 - 3) - (unsigned __int16)(WORD1(a2) - a2); v17 = 3i64 - 3 * ((v16 << 16) + (unsigned int)(unsigned __int16)(HIWORD(v16) - (WORD1(a2) - a2 - 3) * (WORD1(a2) - a2 - 3))) - ((unsigned __int64)(3 * ((unsigned __int16)v15 - (unsigned int)(unsigned __int16)(15 - v15) - v15 + 5)) << 32); if ( 31 - 31 * v11 <= (signed __int64)((v17 << 32) + (unsigned int)(HIDWORD(v17) - v17 * v17)) ) { v18 = (unsigned __int16)(HIWORD(v2) - v2 * v2) + (v2 << 16) - 15; v19 = ((v18 << 16) + (unsigned int)(unsigned __int16)(HIWORD(v18) - v18 * v18)) >> 16; v20 = (unsigned __int16)(((_WORD)v19 << 8) + ((((unsigned __int16)(((v18 << 16) + (unsigned int)(unsigned __int16)(HIWORD(v18) - v18 * v18)) >> 16) >> 8) - v19) & 0xFF)) + ((((unsigned __int8)(3 - 15 * ((unsigned __int16)(HIWORD(v18) - v18 * v18) >> 8)) << 8) + 65521 + (unsigned __int8)(BYTE2(v18) - v18 * v18)) << 16); v21 = (unsigned __int8)(3 - ((unsigned __int16)(31 - ((unsigned __int8)-HIBYTE(v20) << 8) - (unsigned __int8)(3 - BYTE2(v20))) >> 8)) + (((unsigned __int8)(1 - (31 - (3 - BYTE2(v20)))) + ((unsigned __int16)(31 - (((_WORD)v19 << 8) + ((((unsigned __int16)(((v18 << 16) + (unsigned int)(unsigned __int16)(HIWORD(v18) - v18 * v18)) >> 16) >> 8) - v19) & 0xFF))) << 8)) << 8); do { v23 = (15 - v10 + (unsigned __int16)(15 - v10) + ((15 - (_DWORD)v10) & 0xFFFF0000) - (unsigned __int64)(unsigned int)(15 - v10)) * (31 - 31 * v11); v10 = ((unsigned __int16)(15 - v10) + ((15 - (_DWORD)v10) & 0xFFFF0000)) * (31 - 31 * (_DWORD)v11) - (unsigned __int64)((unsigned __int16)v23 + ((unsigned int)v23 & 0xFFFF0000)) - v23 + 15; v11 = (-31 * v11 + 1) * (1190112520884487201i64 - 31i64 * *(unsigned __int8 *)(qword_14003FAA0 + 2)); v22 = 3i64 - 3 * (((((_DWORD)a2 << 16) + (unsigned __int16)(WORD1(a2) - a2) + (unsigned __int16)(WORD1(a2) - a2 - 3) - (unsigned __int16)(WORD1(a2) - a2)) << 16) + (unsigned int)(unsigned __int16)(((((_DWORD)a2 << 16) + (unsigned __int16)(WORD1(a2) - a2) + (unsigned __int16)(WORD1(a2) - a2 - 3) - (unsigned int)(unsigned __int16)(WORD1(a2) - a2)) >> 16) - (WORD1(a2) - a2 - 3) * (WORD1(a2) - a2 - 3))) - ((unsigned __int64)(3 * ((unsigned __int16)v21 - (unsigned int)(unsigned __int16)(15 - v21) - v21 + 5)) << 32); } while ( 31 - 31 * v11 <= (signed __int64)((v22 << 32) + (unsigned int)(HIDWORD(v22) - v22 * v22)) ); } v24 = ((15 - v10 + (unsigned __int16)(15 - v10) + ((15 - (_DWORD)v10) & 0xFFFF0000) - (unsigned __int64)(unsigned int)(15 - v10)) >> 32) * (((unsigned __int16)(word_14003DB70 - 16916) << 16) + (unsigned __int16)(word_14003DB68 + 31518 + (unsigned __int8)(93 * (word_14003DB68 + 30) - 67 * *(_BYTE *)(qword_14003FAA0 + 2) - 74) - (unsigned __int8)(word_14003DB68 + 30))); v25 = (((((unsigned __int16)(21587 - word_14003DB64 - 1) << 16) + (unsigned __int16)(-31 * (-31539 - word_14003DB60))) << 16) + (unsigned __int16)(((((unsigned __int16)(21587 - word_14003DB64 - 1) << 16) + (unsigned int)(unsigned __int16)(-31 * (-31539 - word_14003DB60))) >> 16) - -31 * (-31539 - word_14003DB60) * -31 * (-31539 - word_14003DB60))) * ((unsigned __int16)(-21845 * HIWORD(v24)) + ((unsigned __int16)((unsigned __int8)v24 - (unsigned __int8)(15 - v24) - v24) << 16)); sub_4(&v38, &v37); v26 = 31 * ((unsigned __int64)sub_5(&v40, &v39) + 2032094506); v29 = ((unsigned __int16)(HIWORD(v26) - word_14003DB6C) + ((unsigned __int16)(31 * v26) << 16)) * (v28 + ((v27 + (unsigned __int16)abs((_WORD)v28)) << 16)); v30 = ((unsigned __int64)((v29 & 0xFFFF0000) + (unsigned __int16)v29) << 32) + (v25 >> 16) + ((v25 + (unsigned __int16)abs(HIWORD(v25))) << 16) + 31; result = v36; v32 = ((unsigned __int64)(-1431655765 * (_DWORD)v30 + (unsigned __int16)(21845 * v30) - (unsigned __int16)(-21845 * v30) + (unsigned __int16)(15 - 21845 * v30) - (unsigned int)(unsigned __int16)(21845 * v30)) << 32) + (unsigned int)-((HIDWORD(v30) << 16) + 65537 * (HIDWORD(v30) >> 16)); *v36 = HIDWORD(v32); v36[1] = ((v32 >> 32) + ((unsigned __int64)(unsigned int)v32 << 32)) >> 32; return result; }

This is all what the-best-available-decompiler was able to infer from our automagically-obfuscated code

Not much if you ask me <wink />; figuring out what this function does – is very non-obvious to say the least (of course, it still does compute factorial, it is just that finding it out will take some hours – if not days). And the function-which-deals-with-our-exception (and which previously gave away the “Negative…” literal) now looks as follows:

//Listing 29.factorial.obf.sub_3.decompiled.c _QWORD *__fastcall sub_3(_QWORD *a1, unsigned __int64 *a2) { unsigned __int64 *v2; // rbx _QWORD *v3; // rdi unsigned __int64 v4; // rdx unsigned __int64 v5; // rcx unsigned __int64 v6; // rdx unsigned __int64 v7; // rax v2 = a2; v3 = a1; *a1 = &off_140037FC0; sub_6(a1 + 1); v4 = v2[3]; if ( v4 >= 0x10 ) { v5 = *v2; v6 = v4 + 1; if ( v6 >= 0x1000 ) { if ( v6 + 39 <= v6 || v5 & 0x1F || (v7 = *(_QWORD *)(v5 - 8), v7 >= v5) || v5 - v7 - 8 > 0x1F ) sub_7(); v5 = *(_QWORD *)(v5 - 8); } sub_8(v5); } v2[2] = 0i64; v2[3] = 15i64; *(_BYTE *)v2 = 0; return v3; }

We can see no literal in plain sight anymore – which is certainly a Good Thing(tm)…

In addition, none of the code in our factorial() function is relatively-easily-detectable “dead code”: all the code can be executed (and the code along those branches which don’t raise exceptions, is executed during normal operation of the program).

Bottom line:

While it is not 100% incomprehensible (nothing is), we did manage to obtain a very poorly readable machine code out of our rather-readable source.

This means that using this type of obfuscation, we’re creating an asymmetry in our favour – while we’re taking relatively limited efforts on our side, we’re increasing the attacker’s efforts by orders of magnitude. This kind of asymmetrical efforts is exactly what-good-protection should do <smile />.

Ever-Changing Binary Code

But wait – we’re still not done with the benefits of ithare::obf. On each subsequent release build of our Client, we should use different ITHARE_OBF_SEED.6

Using this technique, for each Client release, we’ll generate different (but still functionally equivalent) obfuscation code; this, in turn, will make a large chunk of attacker’s-efforts-coming-from-previous-build, useless.7

Let’s compare two decompiles of the exactly same source code, but with different ITHARE_OBFUSCATION_SEEDs (screenshot from https://www.diffnow.com/):

As we can see, obfuscated code on both sides uses the same principles, but the code itself is very different; this, in turn, means that for each new Client build at least a significant portion of reverse engineering efforts should be repeated.

Moreover,

If we manage to release new versions of our Client faster than attacker can reverse engineer our automagically-generated stuff – bingo! We won! 8

Once again, what we’re trying to do here – is to create an asymmetry, when something-which-comes-to-us-for-free (namely, automagically generated different obfuscation for each build) causes significant trouble to the attacker.

Caveats

Of course, as any tool of this kind, there are certainly caveats (in addition to any bugs-due-to-such-complicated-library-being-pre-alpha).

Still Security-by-Obscurity <sad-face />

First, most importantly, all-the-information-attacker-needs, is still in the code <sad-face />; what we’re doing, is merely hiding things, which cannot be really secure. This means that as-a-way-to-protect-your-intellectual-property (~=”your offline program from being pirated”), this method still isn’t likely to work9 (not to mention that motivation for most of such protections is controversial).

Still, in the context of MOGs, IMNSHO there is considerable value in hiding things in the manner described above (especially if such obfuscation is applied consistently across lots of the code). In particular, as we noted above, if attackers cannot cope with the pace of our new releases – it means that we won’t have problem with bot fighters/cheaters anymore. Sure, at some point they will invent tools-which-can-automagically-deobfuscate-our-stuff, but first, it will take time, and second – after they do it, we’ll be in position to readjust (enabling us to go into arms race with the attackers, which is already a tremendous improvement over the current state of affairs); moreover – my gut feeling is that in this battle we do have an upper hand (just because the attacker has to solve an inverse problem compared to us solving forward problem, and solving inverse problems tends to be much more complicated than solving forward problems).

Performance

As for performance – of course, convoluted obfuscated binary code does take its toll; that’s exactly why all the obfuscation primitives are supposed to mention maximum-number-of-CPU-cycles-we-can-spare. And of course, for 3D vertex-level code using any obfuscation is a fallacy.

On the other hand – it is well-known that 5% of code take 95% of CPU time (and conversely, 95% of code take only 5% of CPU time); this tends to stand for games too (in particular, script engines tend to be much less optimized than 3D graphics). As a result, our solution is rather obvious: let’s obfuscate non-performance-critical code.

Moreover,

It is non-performance-critical-code which tends to be of the most interest to the attackers(!)

Indeed, trying to extract usable information (such as “opponent is at 11 o’clock”) from vertex-level manipulations is a gargantuan task; it is usually much easier to get the opponent’s coordinates right after the network-packet-with-these-coordinates is parsed; however – as this parsing doesn’t happen too often, it can be obfuscated without causing too much of the performance penalty.

Let’s make a quick calculation. Our non-obfuscated factorial() function (when called to calculate factorial(20)), takes about 10ns of time. Our seriously-obfuscated version shown above, takes about 200ns – which is whopping 20x longer. However, let’s not be hasty to write it off on this “20x slower” basis. If we’re using some-function-obfuscated-in-a-similar-manner to parse our network packets – we’ll have to call it only once per network tick, i.e. 20 times per second; under these conditions – adding 200ns per network tick will take only 4 microseconds out of each second, or mere 0.0004%. In other words –

we can easily use as many as 10000 such-obfuscations-as-factorial()-above-uses per each network packet – and performance penalty will still be unnoticeable.

Sure, we DO need to be very careful about performance – but the division between “high-performance code” and “code in need for obfuscation” is there, so it is possible to exploit it.

Open Source Obfuscation is NOT an Oxymoron

“how can we possibly both obfuscate things and make them 100% public at the same time?At the first glance, saying “open source” and “obfuscation” in the same breath sounds as an oxymoron; how can we possibly both obfuscate things and make them 100% public at the same time? However, as soon as we say that we’re using randomized code generation – it starts to make sense.

While ithare::obf itself is open-source (and therefore can be validated by lots of people(!)), ITHARE_OBF_SEED parameter which was used to compile each of the builds, is certainly intended to stay out of sight, so that the attacker won’t know what exact obfuscation was generated for the executable he’s dealing with. This, in turn, means that

It IS possible to have ALL the code open, just ITHARE_OBF_SEED private – and still enjoy some protection provided by obfuscation

In a sense – it is similar to having encryption algorithms completely public, with just their keys being secret; here – ithare::obf plus our obfuscated-source-code is our public algorithm, and ITHARE_OBF_SEED is our key.

In particular (as we’ll discuss in [[TODO]] section below) we can try to use similar obfuscation to obfuscate Client-2-Server protocols in a 100%-open-source game (just not publishing specific ITHARE_OBF_SEED used for building a specific Server+specific Clients). This, in turn, would allow anybody to compile and run their own Server+Client; it is just that ITHARE_OBF_SEED will be unknown for each such Server+Client pair – so while all such Servers+Client pairs are functionally the same (and play the same game) – protocols for each such pair will be different, with the automagically-generated obfuscation code providing non-trivial protection from cheaters and bots (and moreover, for each such Server+Client pair obfuscation will need to be broken separately).

In addition, even now ithare::obf has a modular structure, where it is rather easy to add your own primitives (and ithare::obf-with-your-own-primitives is no longer 100% open source, which may both provide additional protection and give you more peace of mind). In the future, I expect to provide an “official” way to expand ithare::obf with your-own-primitives without the need to modify its source code.

Summary on ithare::obf

To summarize our findings so far:

It is possible to have a code which is both rather readable in source, and pretty much unreadable in binary As a side benefit – it is possible to re-generate our binary code on each build at virtually zero cost for us, (while making the life of attackers much more complicated)

ithare::obf library is currently immature (and certainly NOT production-ready), but it does provide proof-of-concept implementation for these things Even current state-of-the-art IDA decompiler is no match for even-current rather-basic implementation of ithare::obf

It is still not a silver bullet – but can be used as a building block to build very serious protection from cheaters (in fact, most of anti-cheater protection techniques would benefit from this kind of randomized obfuscation code).

“ Performance-wise, we can do A LOT of obfuscation per network tick And apparently, it is rather-performance-non-critical stuff which is the easiest to attack, so it is exactly the thing which we should obfuscate (alongside with any anti-cheating routines we want to have)

Open-source obfuscation is possible (as long as we keep seed out of sight); in a sense – it is about the same as having crypto-algorithm public, while keeping its key secret (with our ithare::obf directly corresponding to the crypto-algorithm, and our seed corresponding to the key).

[[To Be Continued…

This concludes beta Chapter 29(h) from the upcoming book “Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)”.

Stay tuned for Chapter 29(i), where we’ll discuss issues related to protocol obfuscation (including versioning)]]

Acknowledgement

Cartoons by Sergey Gordeev from Gordeev Animation Graphics, Prague.