This article explains all the steps needed to write a C++ program which dynamically generates encryption algorithms in x86 assembly code.

All the above algorithms are extensively documented, and implementations can easily be found for many programming languages.

The encryption component of these algorithms uses a public key, while the decryption component requires a private key. As the name implies, public keys can be safely published on the Internet (like the keys in the PGP encryption system). Unlike symmetric encryption algorithms, where a leaked key means anyone can read the message, asymmetric encryption algorithms ensure that only the holder of the private key can decrypt the message. Since the public and private keys are mathematically related, it is theoretically possible to “crack” a public key and obtain the private key, however there are no known practical ways of doing this. In the case of RSA, cracking the public key involves splitting it into its two prime factors (RSA public keys are the product of two very large prime numbers). This factoring process currently takes an enormous time. The RSA company even organized a public challenge to break RSA keys , where the largest prize was $US200,000 for cracking a 2048-bit key.

There are also stream cipher algorithms (which encrypt data one byte at a time), such as the popular RC4 algorithm. Most encryption algorithms are symmetric, which means that the same key is used for both encrypting and decrypting data. However, there are a number of asymmetric encryption functions, based on public key infrastructure. This group of algorithms includes:

There is a wide variety of encryption algorithms available. The advantage of these algorithms is that they are heavily studied and their strengths and weaknesses are known. The first type are block cipher algorithms (data is encrypted in blocks of fixed size), including:

However, polymorphic engines have found another use: software protection systems, or exe-protectors, such as PELock , ASProtect, EnigmaProtector, Themida, and Obsidium. They are used to make it more difficult for an application's code to be analyzed by crackers (i.e. people who try to break software protection), and to make it harder for automated unpackers to be created (i.e. tools which remove software protection). If you possess any shareware program which is protected by an exe-protector, there is a 99% chance that it uses a polymorphic encryption engine.

In most cases, this type of encryption was used in executable file infectors. Due to the level of complexity of polymorphic engines, and the necessity of an in-depth understanding of assembly language for their creation, these days polymorphic engines are rarely used (one exception to this is Virut which, I'd like to point out, is strongly suspected of being Polish in origin!)

In time, polymorphic algorithms evolved and became ever more sophisticated, in order to make it as difficult as possible for antivirus programs to analyze viruses and run them in an emulated environment. Some well-known algorithms included:

Polymorphism has several meanings in the software world. In this case, we are referring to unique, dynamically-generated code (i.e. computer instructions). This technique has its roots in computer viruses. Already in the days of MS-DOS, some computer viruses were encrypted by their creators in order to evade detection by antivirus software. Of course, the viruses needed to contain decryption routines, and these were quickly added to virus signature databases. Soon, virus creators developed decryption algorithms whose code was uniquely generated every time, which allowed viruses to be created which could not be detected using static signatures. The first recorded polymorphic engines date back to the year 1990.

Our own polymorphic engine

In this article, we will cover all the steps necessary to create a simple polymorphic engine, which will serve to encrypt any supplied data, and generate a unique decryption procedure, with the encrypted data embedded within it.

To create this engine, we'll use the C++ language and the AsmJit library, which allows dynamic generation of assembly code. AsmJit makes it possible to create assembly code at the C++ level, as if you were writing it by hand. It is possible to create 32- or 64-bit instructions, and even pseudoinstructions. Thanks to this, the library can be used to target code for 32- and 64-bit environments. In our case we'll stick with 32-bit code.

Listing 1.Pseudocode in C++ for the decryption function that we will generate.

DWORD DecryptionProc(PDWORD lpdwOutput) { // randomly generated encryption key DWORD dwDecryptionKey = 0xA39383D; // pointer to the encrypted data, which will be located // at the end of the decryption function PDWORD lpdwInput = reinterpret_cast<PDWORD>(&cEncryptedData); // main decryption loop, composed of // randomly selected encryption instructions for (DWORD i = 0; i < NUM_BLOCKS; i++) { DWORD dwInputBlock = lpdwInput[i]; dwInputBlock ^= 0x453BC; dwInputBlock += dwDecryptionKey; ... lpdwOutput[i] = dwInputBlock; } // return the size of the encrypted block // (in bytes) return DECRYPTED_DATA_SIZE; // the encrypted data, stored at the end // of the function BYTE cEncryptedData[] = { 0xAB, 0xBA... }; }

Our decryption function will take just one parameter: a pointer to the output buffer where the decrypted data will go. The function will return the size of the decrypted data. Sound simple? Then let's get to work!

Random register selection Our decryption function will use the standard 32-bit processor registers, and will obtain its one parameter from the stack using the stack frame pointer in the EBP register. The registers used for decryption, for holding a pointer to the encrypted data, and for holding a pointer to the output buffer, will be selected randomly each time the decryption routine is generated. Listing 2.Random selection of registers for the decryption function. /////////////////////////////////////////////////////////// // // random register selection // /////////////////////////////////////////////////////////// void CMutagenSPE::RandomizeRegisters() { // set random registers AsmJit::GPReg cRegsGeneral[] = { eax, ecx, ebx, edx, esi, edi }; // shuffle the order of registers in the array mixup_array(cRegsGeneral, _countof(cRegsGeneral)); // the register which will contain // a pointer to the encrypted data regSrc = cRegsGeneral[0]; // the register which will contain // a pointer to the output buffer // (supplied as a function parameter) regDst = cRegsGeneral[1]; // the register which will contain // the size of the encrypted data buffer regSize = cRegsGeneral[2]; // the register which will contain // the decryption key regKey = cRegsGeneral[3]; // the register which will contain // the current data value and on which // the decryption instructions will operate regData = cRegsGeneral[4]; // set the register whose values will be // preserved across function invocations AsmJit::GPReg cRegsSafe[] = { esi, edi, ebx }; // shuffle the order of the registers in the array mixup_array(cRegsSafe, _countof(cRegsSafe)); regSafe1 = cRegsSafe[0]; regSafe2 = cRegsSafe[1]; regSafe3 = cRegsSafe[2]; } This random choice of registers will contribute to the generated code being different each time.

Prologue of the decryption function The prologue of the decryption function is simply the first section of the function, which contains some standard elements like setting up the stack frame and preserving registers. The stack frame is a special region reserved on the stack for holding local variables (if the function needs any) as well as holding the parameters supplied to the function. In our case the one parameter will be a pointer to the buffer where we will store the decrypted data. Structure of the stack frame once the function is invoked Figure 1. Structure of the stack frame once the function is invoked Listing 3.Generating the prologue. /////////////////////////////////////////////////////////// // // generate the prologue of the decryption function // /////////////////////////////////////////////////////////// void CMutagenSPE::GeneratePrologue() { // function prologue // first the original value of EBP is saved // so we can use EBP to refer to the stack frame if (rnd_bin() == 0) { a.push(ebp); a.mov(ebp,esp); } else { // equivalent to the instructions // push ebp // mov ebp,esp a.enter(imm(0), imm(0)); } // if our function is called using the stdcall // convention, and modifies ESI, EDI, or EBX, // they must be saved at the beginning of // the function and restored at the end a.push(regSafe1); a.push(regSafe2); a.push(regSafe3); // load the pointer to the output buffer // into our randomly-selected register regDst // (this is the only parameter to the function, // passed on the stack) a.mov(regDst, dword_ptr(ebp, 0x08 + (4 * 0))); }

Address of the encrypted data Our dynamically generated code can be located anywhere in memory and launched from there (assuming the memory region has the appropriate executable flags set). In such cases we cannot refer to parts of the function using absolute memory addresses, because we simply don't know where the function will reside. The encrypted data is located just after the end of the decryption routine, and we don't know its address in advance either. We will need to determine its address at runtime, through the use of relative addressing. We will employ the delta offset technique for this purpose. It depends on the use of the call assembly instruction, normally used to call a function. We take advantage of the fact that this instruction stores a return address on the stack before jumping to a chosen memory address. Listing 4.Obtaining the current code address through the use of the delta offset technique. Decryptor proc near ; preserve the original value of EBP push ebp ; this puts the return address on the stack call delta_offset delta_offset: ; pop the return address into EBP ; it will be the address of the label 'delta_offset' pop ebp ; load the true address of var_1 by using the ; relative address with reference to delta_offset lea eax,[ebp + (var_1 - delta_offset)] ; verify that our delta offset address is ; correct; return TRUE / FALSE cmp dword ptr[eax],0DEADC0DEh sete al movzx eax,al ; restore the value of EBP pop ebp ret ; this data value is placed directly after ; the function var_1 dd 0DEADC0DEh Decryptor endp The return address generated by call delta_offset will refer to the next instruction, in this case pop ebp . Once we know the memory address of this instruction, we can work out the address of anything else in our function. In our case, we work out the location of our "encrypted" data by adding the number of bytes between the delta_offset: label and the end of the function. An interesting alternative for calculating a delta offset address involves the use of the FPU instruction fnstenv , which stores the environment state of the FPU. Among other things, this contains the memory location of the most recently executed FPU instruction. Like call , this allows us to obtain the location of an instruction in our function and we can use a similar delta offset calculation to refer to data stored after the function. Listing 5.Using FPU instructions to calculate a delta offset address. DeltaOffsetFPUTest proc uses esi ; local buffer for FPU environment local fpEnvironment[32]:byte ; initialize FPU finit ; execute any FPU instruction (e.g. fld1 ; fldz, fldln2, fldlg2 etc.) delta_offset: fldpi ; save the FPU environment state lea eax,fpEnvironment fnstenv byte ptr[eax] ; read the address of the last executed ; FPU instruction – it should point to the ; address of the fldpi instruction, that is, ; the address of the delta_offset label mov esi,dword ptr[eax+12] ; remove 1 register from the FPU stack ; (after fldpi) fistp dword ptr[esp-4] ; get the address of var_1 by using its ; address relative to delta_offset lea eax,[esi + (var_1 - delta_offset)] ; verify that our delta offset address is ; correct; return TRUE / FALSE cmp dword ptr[eax],0ABBAh sete al movzx eax,al ret ; value written just after the function body var_1 dd 0ABBAh DeltaOffsetFPUTest endp This approach to the delta offset calculation is sometimes used in exploits, e.g. in the Metasploit framework. A similar technique is also used in debugger detection, where the fnstenv instruction will reveal that the most recently executed FPU instruction is not part of our program, but refers to some other FPU instruction, which is directly or indirectly executed by the debugger. In this way, it is possible to discover tracing of our code by debuggers including OllyDbg. Delta offset calculations can rouse the suspicions of antivirus programs, since normal applications do not have a need for this kind of thing. The combination of call and pop r32 can cause the application to be suspected of containing malware. In order to prevent this, we must generate extra code between these two instructions, and utilize a different means of getting values from the stack. Listing 6.Generating code for relative addressing. /////////////////////////////////////////////////////////// // // generate delta offset // /////////////////////////////////////////////////////////// void CMutagenSPE::GenerateDeltaOffset() { // generate code which will allow us to // obtain a pointer to the encrypted data // at the end of the decryption function // decryption_function: // ... // call delta_offset // mov eax,1 | xor eax,eax ; \ // leave ; > unused instructions // ret 4 ; / // delta_offset: // pop regSrc // add regSrc, (encrypted_data-delta_offset + // ... + size of the unused instructions) // ret 4 // db 0CCh, 0CCh... // encrypted_data: // db 0ABh, 0BBh, 083h... // create the delta_offset label lblDeltaOffset = a.newLabel(); // generate 'call delta_offset' a.call(lblDeltaOffset); sysint_t posUnusedCodeStart = a.getOffset(); // in order to avoid getting flagged by // antivirus software, we avoid the typical // delta offset construction, i.e. call + pop, // by inserting some unused instructions in // between, in our case a sequence that looks // like the normal code which returns from // a function if (rnd_bin() == 0) { a.mov(eax, imm(1)); } else { a.xor_(eax,eax); } a.leave(); a.ret(1 * sizeof(DWORD)); // calculate the size of the unused code, // i.e. the difference between the current // position and the beginning of the // unused code dwUnusedCodeSize = static_cast<DWORD>(a.getOffset() - posUnusedCodeStart); // put the label "delta_offset:" here a.bind(lblDeltaOffset); posDeltaOffset = a.getOffset(); // instead of the pop instruction, we will // use a different method of reading the // stack, to avoid rousing the suspicions of // antivirus programs //a.pop(regSrc); a.mov(regSrc, dword_ptr(esp)); a.add(esp, imm(sizeof(DWORD))); // the address of the label "delta_offset:" // will now be in the regSrc register; // we need to adjust this by the size of // the remainder of the function (which we // don't know and will have to update later) // for now we use the value 987654321 to // ensure AsmJit generates the long form of // the "add" instruction a.add(regSrc, imm(987654321)); // save the position of the previous DWORD // so that we can later update it to contain // the length of the remainder of the function posSrcPtr = a.getOffset() - sizeof(DWORD); } At the moment, we don't know the size of the remainder of the decryption function, which is why we will need to update this value later, once we have generated all the other instructions of the decryption function.

Encryption The encryption of the data will occur in 4-byte blocks. The data can be smaller than this (it will be rounded up to a multiple of 4). Pseudoinstructions will be randomly generated for the encryption process, and they will later be used to generate the decryption code. Listing 7.Encryption. /////////////////////////////////////////////////////////// // // generate the encryption keys, encryption instructions, // and finally encrypt the input data // /////////////////////////////////////////////////////////// void CMutagenSPE::EncryptInputBuffer(PBYTE lpInputBuffer, \ DWORD dwInputBuffer, \ DWORD dwMinInstr, \ DWORD dwMaxInstr) { // generate an encryption key dwEncryptionKey = rnd_dword(); // round up the size of the input buffer DWORD dwAlignedSize = align_dword(dwInputBuffer); // number of blocks to encrypt // divide the size of the input data // into blocks of 4 bytes (DWORDs) dwEncryptedBlocks = dwAlignedSize / sizeof(DWORD); PDWORD lpdwInputBuffer = reinterpret_cast<PDWORD>(lpInputBuffer); // allocate memory for the output data // (the size will be rounded to the // block size) di_valloc(&diEncryptedData, dwAlignedSize); PDWORD lpdwOutputBuffer = reinterpret_cast<PDWORD>(diEncryptedData.lpPtr); // randomly select the number of encryption instructions dwCryptOpsCount = rnd_range(dwMinInstr, dwMaxInstr); // allocate memory for an array which will // record information about the sequence of // encryption instructions di_valloc(&diCryptOps, dwCryptOpsCount * sizeof(SPE_CRYPT_OP)); // set up a direct pointer to this table // in a helper variable lpcoCryptOps = reinterpret_cast<P_SPE_CRYPT_OP>(diCryptOps.lpPtr); // generate encryption instructions and their type for (DWORD i = 0; i < dwCryptOpsCount; i++) { // will the instruction perform an operation // combining regData and regKey? lpcoCryptOps[i].bCryptWithReg = rnd_bool(); // the register we are operating on lpcoCryptOps[i].regDst = regData; // if the instruction doesn't use the regKey // register, generate a random key which // will be used in the operation if (lpcoCryptOps[i].bCryptWithReg == FALSE) { lpcoCryptOps[i].dwCryptValue = rnd_dword(); } else { lpcoCryptOps[i].regSrc = regKey; } // randomly choose the type of encryption instruction lpcoCryptOps[i].cCryptOp = static_cast<BYTE>(rnd_range(SPE_CRYPT_OP_ADD, SPE_CRYPT_OP_NEG)); } // encrypt the input data according to the // instructions we have just generated for (DWORD i = 0, dwInitialEncryptionKey = dwEncryptionKey; \ i < dwEncryptedBlocks; i++) { // take the next block for encryption DWORD dwInputBlock = lpdwInputBuffer[i]; // encryption loop: executes the sequence of // encryption instructions on the data block for (DWORD j = 0, dwCurrentEncryptionKey; j < dwCryptOpsCount; j++) { if (lpcoCryptOps[j].bCryptWithReg == FALSE) { dwCurrentEncryptionKey = lpcoCryptOps[j].dwCryptValue; } else { dwCurrentEncryptionKey = dwInitialEncryptionKey; } // depending on the encryption operation, // perform the appropriate modification // of the data block switch(lpcoCryptOps[j].cCryptOp) { case SPE_CRYPT_OP_ADD: dwInputBlock += dwCurrentEncryptionKey; break; case SPE_CRYPT_OP_SUB: dwInputBlock -= dwCurrentEncryptionKey; break; case SPE_CRYPT_OP_XOR: dwInputBlock ^= dwCurrentEncryptionKey; break; case SPE_CRYPT_OP_NOT: dwInputBlock = ~dwInputBlock; break; case SPE_CRYPT_OP_NEG: dwInputBlock = 0L - dwInputBlock; break; } } // store the encrypted block in the buffer lpdwOutputBuffer[i] = dwInputBlock; } }

Setting up the decryption keys Our algorithm will use a randomly generated encryption key, but to prevent the key appearing directly in the code, we will encrypt it with a second, random key. At the start of the decryption function we will load the encrypted key into regKey , and then decrypt it. Listing 8.Generating initialization code for the regKey register. /////////////////////////////////////////////////////////// // // set up the keys which will be used to decrypt the data // /////////////////////////////////////////////////////////// void CMutagenSPE::SetupDecryptionKeys() { // set up a decryption key in the regKey // register, which will itself be encrypted DWORD dwKeyModifier = rnd_dword(); // randomly generate instructions to set up // the decryption key switch(rnd_max(2)) { // mov regKey,dwKey - dwMod // add regKey,dwMod case 0: a.mov(regKey, imm(dwEncryptionKey - dwKeyModifier)); a.add(regKey, imm(dwKeyModifier)); break; // mov regKey,dwKey + dwMod // sub regKey,dwMod case 1: a.mov(regKey, imm(dwEncryptionKey + dwKeyModifier)); a.sub(regKey, imm(dwKeyModifier)); break; // mov regKey,dwKey ^ dwMod // xor regKey,dwMod case 2: a.mov(regKey, imm(dwEncryptionKey ^ dwKeyModifier)); a.xor_(regKey, imm(dwKeyModifier)); break; } }

Decryption Using the pseudoinstructions we generated earlier and which we used to encrypt the data, we will now generate instructions which perform the encryption process in reverse, and place them in a loop that repeats this process on each input block. Before the loop we will initialize the register we have called regSize with the number of blocks which need to be decrypted. Listing 9.Generating the code of the decryption loop. /////////////////////////////////////////////////////////// // // generate the decryption code (for the main decryption loop) // /////////////////////////////////////////////////////////// void CMutagenSPE::GenerateDecryption() { // set up the size of the encrypted data // (in blocks) a.mov(regSize, imm(dwEncryptedBlocks)); // create a label for the start of the // decryption loop Label lblDecryptionLoop = a.newLabel(); a.bind(lblDecryptionLoop); // read the data referred to by the // regSrc register a.mov(regData, dword_ptr(regSrc)); // build the decryption code by generating each // decryption instruction in turn (reversing the // order and the operations that were used for // encryption!) for (DWORD i = dwCryptOpsCount - 1; i != -1L; i--) { // encryption was done either with the key // in register regKey, or a constant value, // so depending on this we need to generate // the appropriate decryption instructions if (lpcoCryptOps[i].bCryptWithReg == FALSE) { DWORD dwDecryptionKey = lpcoCryptOps[i].dwCryptValue; switch(lpcoCryptOps[i].cCryptOp) { case SPE_CRYPT_OP_ADD: a.sub(lpcoCryptOps[i].regDst, imm(dwDecryptionKey)); break; case SPE_CRYPT_OP_SUB: a.add(lpcoCryptOps[i].regDst, imm(dwDecryptionKey)); break; case SPE_CRYPT_OP_XOR: a.xor_(lpcoCryptOps[i].regDst, imm(dwDecryptionKey)); break; case SPE_CRYPT_OP_NOT: a.not_(lpcoCryptOps[i].regDst); break; case SPE_CRYPT_OP_NEG: a.neg(lpcoCryptOps[i].regDst); break; } } else { switch(lpcoCryptOps[i].cCryptOp) { case SPE_CRYPT_OP_ADD: a.sub(lpcoCryptOps[i].regDst, lpcoCryptOps[i].regSrc); break; case SPE_CRYPT_OP_SUB: a.add(lpcoCryptOps[i].regDst, lpcoCryptOps[i].regSrc); break; case SPE_CRYPT_OP_XOR: a.xor_(lpcoCryptOps[i].regDst, lpcoCryptOps[i].regSrc); break; case SPE_CRYPT_OP_NOT: a.not_(lpcoCryptOps[i].regDst); break; case SPE_CRYPT_OP_NEG: a.neg(lpcoCryptOps[i].regDst); break; } } } // write the decrypted block to the output // buffer a.mov(dword_ptr(regDst), regData); // update the pointers to the input and ouput // buffers to point to the next block a.add(regSrc, imm(sizeof(DWORD))); a.add(regDst, imm(sizeof(DWORD))); // decrement the loop counter (the number of // blocks remaining to decrypt) a.dec(regSize); // check if the loop is finished // if not, jump to the start a.jne(lblDecryptionLoop); } The loop takes successive encrypted blocks of data from the buffer at the end of the function, then carries out all the decryption instructions on each block, and writes the result to the output buffer. After decrypting a data block, the pointers to the encrypted data and the decrypted data buffer are updated, and the loop counter which indicates the number of blocks remaining is decremented. If it has not reached 0, the loop is repeated.

Setting final register values Our decryption function will return a value of type DWORD (a 32-bit value corresponding to unsigned int ), which indicates the size of the decrypted data. This value will be returned in the EAX register. Listing 10.Setting up the return value of the decryption function (the size of the decrypted data) as well as any other final values we wish to place in registers. /////////////////////////////////////////////////////////// // // set up output registers, including the function return value // /////////////////////////////////////////////////////////// void CMutagenSPE::SetupOutputRegisters(SPE_OUTPUT_REGS *regOutput, DWORD dwCount) { // if there are no output registers to // set up, return if ((regOutput == NULL) || (dwCount == 0)) { return; } // shuffle the order in which the registers // will be set up mixup_array(regOutput, dwCount); // generate instructions to set up the // output registers // mov r32, imm32 for (DWORD i = 0; i < dwCount; i++) { a.mov(regOutput[i].regDst, imm(regOutput[i].dwValue)); } } Our polymorphic engine allows any set of output registers to be defined (that is, we are not limited to setting the value returned in EAX ). This makes it possible for the function to output extra values.

Function epilogue The epilogue, in other words, the last portion of the function, is where the original value of the EBP register is restored, and if necessary, the sensitive registers ESI EDI EBX , whose state must be preserved across function calls according to the stdcall convention. Listing 11.Generating the epilogue of the decryption function. /////////////////////////////////////////////////////////// // // generate epilogue of the decryption function // /////////////////////////////////////////////////////////// void CMutagenSPE::GenerateEpilogue(DWORD dwParamCount) { // restore the original values of // registers ESI EDI EBX a.pop(regSafe3); a.pop(regSafe2); a.pop(regSafe1); // restore the value of EBP if (rnd_bin() == 0) { a.leave(); } else { // equivalent to "leave" a.mov(esp,ebp); a.pop(ebp); } // return to the code which called // our function; additionally adjust // the stack by the size of the passed // parameters (by stdcall convention) a.ret(imm(dwParamCount * sizeof(DWORD))); }

Alignment Instructions which access memory are faster if the data they read or write is aligned, i.e. divisible by the size of the block of memory being accessed. For instance, our encrypted data is accessed in 32-bit blocks (since we are using 32-bit registers), which means memory addresses should be divisible by 4. This is due to processor cache effects. If memory addresses are correctly aligned, the data can be stored in the fast memory attached to the processor known as L1 cache. Non-aligned addresses will be read from the slower L2 cache or directly from the computer's RAM. For small blocks of data, this does not create a noticeable difference, but for large buffers the performance reduction can be significant and should be taken into account. Listing 12.Aligning the size of the decryption function to the specified granularity. /////////////////////////////////////////////////////////// // // align the size of the decryption function // to the specified granularity // /////////////////////////////////////////////////////////// void CMutagenSPE::AlignDecryptorBody(DWORD dwAlignment) { // take the current size of the code DWORD dwCurrentSize = a.getCodeSize(); // find the number of bytes that would // align the size to a multiple of the // supplied size (e.g. 4) DWORD dwAlignmentSize = align_bytes(dwCurrentSize, dwAlignment) - dwCurrentSize; // check if any alignment is required if (dwAlignmentSize == 0) { return; } // add padding instructions (int3 or nop) if (rnd_bin() == 0) { while (dwAlignmentSize--) a.int3(); } else { while (dwAlignmentSize--) a.nop(); } } Alignment between functions is normally achieved with the 1-byte instruction nop (no-operation) or int3 (an interrupt for the purpose of trapping into a debugger). Alignment is also performed for the beginning of loops, using instructions which have the same effect as nop but have a longer binary encoding, such as lea eax,[eax*8+eax+00000000] . In our case we won't bother with this level of optimization.

Fixing up delta offset addresses Having generated the entire code of the decryption function, we can now correct any relative addresses used earlier, which in our case is just the address of the encrypted data. Listing 13.Correcting the address of the encrypted data in previously generated code. /////////////////////////////////////////////////////////// // // correct all instructions making use of // addressing relative to the delta offset // /////////////////////////////////////////////////////////// void CMutagenSPE::UpdateDeltaOffsetAddressing() { DWORD dwAdjustSize = static_cast<DWORD>(a.getOffset() - posDeltaOffset); // correct the instruction which sets up // a pointer to the encrypted data block // at the end of the decryption function // // this pointer is loaded into the regSrc // register, and must be updated by the // size of the remainder of the function // after the delta_offset label a.setDWordAt(posSrcPtr, dwAdjustSize + dwUnusedCodeSize); }

Appending the encrypted data Now that the entire code of the function has been generated, we can write out the encrypted data block, so it will follow the code of the function. Listing 14.Place the encrypted data block after the end of the decryption function. /////////////////////////////////////////////////////////// // // append the encrypted data to the end of the code // of the decryption function // /////////////////////////////////////////////////////////// void CMutagenSPE::AppendEncryptedData() { PDWORD lpdwEncryptedData = reinterpret_cast<PDWORD>(diEncryptedData.lpPtr); // place the encrypted data buffer // at the end of the decryption function // (in 4-byte blocks) for (DWORD i = 0; i < dwEncryptedBlocks; i++) { a._emitDWord(lpdwEncryptedData[i]); } }

Our entire polymorphic engine Now that you have seen the successive elements of the operation of our polymorphic engine, it's time to bring it all together. I have included the class header below, followed by the implementation of the main function which generates the polymorphic code. Listing 15.Definition of the polymorphic engine class. #include "mutagen.h" class CMutagenSPE : private CMutagen { public: CMutagenSPE(void); ~CMutagenSPE(void); // the main function which performs the // encryption and generates the polymorphic // code CMutagen::erCodes PolySPE(PBYTE lpInputBuffer, \ DWORD dwInputBuffer, \ PBYTE *lpOutputBuffer, \ PDWORD lpdwOutputSize); private: // a structure describing the values of // the output registers typedef struct _SPE_OUTPUT_REGS { // target register AsmJit::GPReg regDst; // value to write in this register DWORD dwValue; } SPE_OUTPUT_REGS, *P_SPE_OUTPUT_REGS; // description of an encryption operation typedef struct _SPE_CRYPT_OP { // TRUE if the operation is performed // on two registers; FALSE if it is // performed between the target register // and the value in dwCryptValue BOOL bCryptWithReg; AsmJit::GPReg regDst; AsmJit::GPReg regSrc; // encryption operation BYTE cCryptOp; // encryption value DWORD dwCryptValue; } SPE_CRYPT_OP, *P_SPE_CRYPT_OP; enum { SPE_CRYPT_OP_ADD = 0, SPE_CRYPT_OP_SUB, SPE_CRYPT_OP_XOR, SPE_CRYPT_OP_NOT, SPE_CRYPT_OP_NEG, }; // buffer with the encryption operations DATA_ITEM diCryptOps; // pointer to the table of encryption // operations P_SPE_CRYPT_OP lpcoCryptOps; // count of encryption operations DWORD dwCryptOpsCount; // pointer to the encrypted data block DATA_ITEM diEncryptedData; // number of blocks of encrypted data DWORD dwEncryptedBlocks; // encryption key DWORD dwEncryptionKey; // AsmJit Assembler instance Assembler a; // the register which will store a pointer // to the data which is to be decrypted AsmJit::GPReg regSrc; // the register which will store a pointer // to the output buffer AsmJit::GPReg regDst; // the register which hold the size of the // encrypted data AsmJit::GPReg regSize; // the register with the encryption key AsmJit::GPReg regKey; // the register on which the decryption // instructions will operate AsmJit::GPReg regData; // the preserved registers (ESI EDI EBX in random order) AsmJit::GPReg regSafe1, regSafe2, regSafe3; // the delta_offset label Label lblDeltaOffset; // the position of the delta offset sysint_t posDeltaOffset; // the relative address of the encrypted data sysint_t posSrcPtr; // the size of the unused code between delta // offset and the instructions which get that // value from the stack DWORD dwUnusedCodeSize; // helper methods void RandomizeRegisters(); void GeneratePrologue(); void GenerateDeltaOffset(); void EncryptInputBuffer(PBYTE lpInputBuffer, \ DWORD dwInputBuffer, \ DWORD dwMinInstr, \ DWORD dwMaxInstr); void SetupDecryptionKeys(); void GenerateDecryption(); void SetupOutputRegisters(SPE_OUTPUT_REGS *regOutput, \ DWORD dwCount); void GenerateEpilogue(DWORD dwParamCount); void AlignDecryptorBody(DWORD dwAlignment); void AppendEncryptedData(); void UpdateDeltaOffsetAddressing(); }; Listing 16.The main function, which performs the encryption and generates the polymorphic code. /////////////////////////////////////////////////////////// // // main function - encrypts data and generates polymorphic // decryptor code // /////////////////////////////////////////////////////////// CMutagen::erCodes CMutagenSPE::PolySPE(PBYTE lpInputBuffer, \ DWORD dwInputBuffer, \ PBYTE *lpOutputBuffer, \ PDWORD lpdwOutputSize) { /////////////////////////////////////////////////////////// // // check input parameters // /////////////////////////////////////////////////////////// if ( (lpInputBuffer == NULL) || (dwInputBuffer == 0) || \ (lpOutputBuffer == NULL) || (lpdwOutputSize == NULL) ) { return CMutagen::MUTAGEN_ERR_PARAMS; } // randomly select registers RandomizeRegisters(); /////////////////////////////////////////////////////////// // // generate polymorphic function code // /////////////////////////////////////////////////////////// // generate function prologue GeneratePrologue(); // set up relative addressing through the delta offset // technique GenerateDeltaOffset(); // encrypt the input data, generate encryption keys // the additional parameters set the lower and upper // limits on the number of encryption instructions // which will be generated (there is no limit to this // number, you can specify numbers in the thousands, // but be aware that this will make the output code // quite large) EncryptInputBuffer(lpInputBuffer, dwInputBuffer, 3, 5); // generate code to set up keys for decryption SetupDecryptionKeys(); // generate decryption code GenerateDecryption(); // set up the values of the output registers SPE_OUTPUT_REGS regOutput[] = { { eax, dwInputBuffer } }; SetupOutputRegisters(regOutput, _countof(regOutput)); // generate function epilogue GenerateEpilogue(1L); // align the size of the function to a multiple // of 4 or 16 AlignDecryptorBody(rnd_bin() == 0 ? 4L : 16L); // fix up any instructions that use delta offset addressing UpdateDeltaOffsetAddressing(); // place the encrypted data at the end of the function AppendEncryptedData(); /////////////////////////////////////////////////////////// // // free resources // /////////////////////////////////////////////////////////// // free the encrypted data buffer di_vfree(&diEncryptedData); // free the array of encryption pseudoinstructions di_vfree(&diCryptOps); /////////////////////////////////////////////////////////// // // copy the polymorphic code to the output buffer // /////////////////////////////////////////////////////////// DWORD dwOutputSize = a.getCodeSize(); // assemble the code of the polymorphic function // (this resolves jumps and labels) PVOID lpPolymorphicCode = a.make(); // this struct describes the allocated memory block DATA_ITEM diOutput; // allocate memory (with execute permissions) for the // output buffer di_valloc(&diOutput, dwOutputSize); // check that allocation was successful if (diOutput.lpPtr != NULL) { // copy the generated code of the decryption function memcpy(diOutput.lpPtr, lpPolymorphicCode, dwOutputSize); // provide the output buffer and code size to // this function's caller *lpOutputBuffer = diOutput.lpPtr; *lpdwOutputSize = dwOutputSize; MemoryManager::getGlobal()->free(lpPolymorphicCode); } else { MemoryManager::getGlobal()->free(lpPolymorphicCode); return CMutagen::MUTAGEN_ERR_MEMORY; } /////////////////////////////////////////////////////////// // // function exit // /////////////////////////////////////////////////////////// return CMutagen::MUTAGEN_ERR_SUCCESS; }