Yesterday, I joined the 0x00sec IRC channel and, as many other times, @dtm come up with an interesting concept… and I had to try. The idea was pretty challenging and I have not completely come up with a full implementation but I manage to get a minimal Proof of Concept program to illustrate the concept and, maybe, to be used as a starting point for other people more interested in the topic than me.

The IBI Crypter

Sure, we were talking about crypters and I named mine IBI crypter. IBI stands for Instruction by Instruction Crypter, and that is the concept. Instead of decrypting the whole application at once or even a block, this crypter tries to decrypt just the instruction that has to be executed at any time. After running that instruction, the code is immediately crypted again so a live memory dump of the process will still be encrypted. This is why I’m referring to this concept as a JIT (Just in Time) crypter.

During our discussion we realised that the problem is not trivial, and I have to admit that it is more challenging that I initially expected. I’ll let you know about the issues at the end of this paper.

Before diving into the code, let me explain you how the crypter works.

An Embedded Debugger

The idea is pretty simple. We start with a program with some crypted sections. The program starts tracing itself and sets a break point at the first function to run. Once that function is executed, the program execution will stop and the embedded debugger will take control. Then it will decrypt the next instruction to run and, after that, it will execute that instruction.

Once the instruction is executed, the debugger will take control again of the process, it will crypt again the instruction just executed and repeat the process until the code is completely executed.

The concept is pretty simple but the implementation is not that trivial as we will see in a while.

To follow the rest of the paper, it may be useful for you to also read this other text:

[Linux] Infecting Running Processes Programming We have already seen how to infect a file injecting code into the binary so it gets executed next time the infected program is started. But, how to infect a process that is already running?. Well, this paper will introduce the basic techniques you need to learn in order to fiddle with other processes in memory… In other words, it will introduce you to the basics of how to write your own debugger. Use Cases Before going into the gory details, let’s introduce a couple of situations that may benef…

Breakpoints on Intel Processors

As we already know how to trace a program and access its memory and registers (yes, you should read the link I have just mentioned), we only need to know how to set a break point. Again the concept itself is pretty simple, at least for the traditional way of setting breakpoints.

Breakpoints make use of the processor instruction int 3 (opcode 0xcc). This is a 1 byte processor instruction and stops the current processor execution and runs some code defined by us. So, the process to set a break point is as follows:

Store the content at the memory address where we want to set our break point.

Write the int 3 instruction in that address.

instruction in that address. Run the program

Whenever the program reaches the int 3 instruction, the program will stop at that point and our code will take control and it can do whatever it needs to do, in our case, decrypt the code to execute.

Once we are done, we have to do a couple more things to restore the execution of the original program:

Copy back the original byte (the one we stored when we set the break point) in their original position.

Decrease the IP (Instruction Pointer). As you know, the IP always points to the next instruction to be executed. In this case, this is the address of our breakpoint plus 1 (the size of the int 3 instruction). We want to decrease it in order to run the original instruction (that we broke injecting the int 3 opcode).

(Instruction Pointer). As you know, the always points to the next instruction to be executed. In this case, this is the address of our breakpoint plus 1 (the size of the instruction). We want to decrease it in order to run the original instruction (that we broke injecting the opcode). Give control back to the original process, or ask the process to just run the next instruction.

Overall the concept is pretty straightforward.

SingleStep Execution

Fortunately for us, the ptrace interface offer a function to execute a single instruction, Otherwise we would have to add some code to figure out the size of the current instruction (it can be 1 to 15 bytes), in order to know where to set up our next breakpoint.

The ptrace PTRACE_SINGLESTEP will do that for us. It will just run one instruction and give us control back. This is actually the last piece of the puzzle to build our simple proof of concept.

Let’s look into it.

The Proof of Concept

We have chosen to illustrate the technique a very basic application to check the validity of a user provided code. The function that does the check will be crypted (and only that function) and it will be executed using the awesome IBI Crypter :P.

Let’s start with the main program and our check function. We have call it target.c

#include <stdio.h> #include "stub.h" #define CRYPT_ME __attribute__((section(".secure"))) // This function is crypted CRYPT_ME int check_key (unsigned char *str) { int i; unsigned char *p = str; while (*p) {*p -= '0'; p++;}; if (str[0] + str[1] != 5) return 1; if (str[2] * str[3] != 10) return 1; return 0; } int main (int argc, char*argv[]) { _stub (check_key); // Setup run environment printf ("Code is %s

", check_key (argv[1]) ? "INCORRECT": "CORRECT"); }

As you can see, the check_key function in the code above, do a couple of stupid checks on the key it receives as parameter, and returns 0 if the key is valid or 1 otherwise. The main function is also pretty simple. It first runs our _stub and then just prints CORRECT or INCORRECT based on the result of the check_key function.

To off-line crypter I will be using is the same described in the paper:

A simple Linux Crypter Malware As @dtm has explained us how to write a crypter for Windows, and @TheDoctor has done the same for C#, I’m going to talk about how to build similar stuff for GNU/Linux. This post is kind of based on something else I wrote some months ago for a different community. I had tried to make a twist of the original paper, but… to be honest, it is difficult to come up with something simpler. I will skip the scan-time crypter. There are no big differences compared to what has already been said in this com…

We are pushing the functions to secure into a separated section for easy identification (see CRYPT_ME macro). Then we XOR it, as described in the paper I have just mentioned.

The interesting stuff is in the _stub function. Let’s look at it

The _stub Function

The _stub function has two main parts. First, we set up a breakpoint in the crypted function we want to run ( check_key in this case). Once we get there, we will start stepping over the function instruction by instruction.

Let’s go with the breakpoint

int _stub (void *ep) { void *bp_ip; long ip1, op1, op2; struct user_regs_struct regs; int status, cnt; printf ("%s", "0x00pf IbI Crypter Stub

"); // Start debugging!!! if ((_pid = fork ()) < 0) PERROR("fork:"); if (_pid == 0) return 0; // Child process just keeps running else { // Father starts debugging child if ((ptrace (PTRACE_ATTACH, _pid, NULL, NULL)) < 0) PERROR ("ptrace_attach:"); printf ("%s", "+ Waiting for process...

"); wait (&status); bp_ip = ep; // Set breakpoint at get there... op1 = ptrace (PTRACE_PEEKTEXT, _pid, bp_ip); DPRINTF("BP: %p 1 Opcode: %lx

", bp_ip, op1); if (ptrace (PTRACE_POKETEXT, _pid, bp_ip, (op1 & 0xFFFFFFFFFFFFFF00) | 0xcc) < 0) PERROR ("ptrace_poke:"); // Run until breakpoint is reached. if (ptrace (PTRACE_CONT, _pid, 0, 0) < 0) PERROR("ptrace_cont:"); wait (&status); ptrace (PTRACE_GETREGS, _pid, 0, ®s); DPRINTF ("Breakpoint reached: RIP: %llx

", regs.rip); regs.rip--; ptrace (PTRACE_SETREGS, _pid, 0, ®s); // REstore opcode ptrace (PTRACE_POKETEXT, _pid, bp_ip, op1);

Hope the code is easy to understand. It creates a new process and starts debugging it. Immediately we set the breakpoint (opcode 0xcc ) in the address received as parameter, that in our case is the check_key function (check the main function above).

Once the break point is set, we just let the program run using the PTRACE_CONT and we just wait for the program to hit the break point… i.e. we wait until the function we want to decrypt gets executed.

The program will eventually call the check_key function (that is actually the next line in the main function) and the _stub code will take control back. Then we have to get the IP register, decreased in one byte (as explained above), set the register value back and also restore the opcode where the int3 was inserted.

Time to run the function.

Decrypting, Running, Encoding and again

At this point we have stopped the application just at the beginning of the check_key function and in order to continue the execution we have to decrypt it as we go.

This is the code that does the trick

// Start step by step debugging ip1 = (long) ep; cnt = 0; while (WIFSTOPPED (status)) { cnt ++; // Read up to 16 bytes to get the longest instruction possible // Decode and write back the decoded code to execute it op1 = ptrace (PTRACE_PEEKTEXT, _pid, ip1); op2 = ptrace (PTRACE_PEEKTEXT, _pid, ip1 + 8); DPRINTF ("%lx :: OPCODES : %lx %lx

", ip1, op1, op2); XOR(op1); XOR(op2); DPRINTF ("%lx :: DOPCODES: %lx %lx

", ip1, op1, op2); ptrace (PTRACE_POKETEXT, _pid, ip1, op1); ptrace (PTRACE_POKETEXT, _pid, ip1 + 8, op2); /* Make the child execute another instruction */ if (ptrace(PTRACE_SINGLESTEP, _pid, 0, 0) < 0) PERROR ("ptrace_singlestep:"); wait(&status); // Re-encode the instruction just executed so we do not have // to count how many bytes got executed XOR(op1); XOR(op2); ptrace (PTRACE_POKETEXT, _pid, ip1, op1); ptrace (PTRACE_POKETEXT, _pid, ip1 + 8, op2); // Get the new IP ptrace (PTRACE_GETREGS, _pid, 0, ®s); ip1 = regs.rip; // If code is outside .secure section we stop debugging if ((void*)ip1 < secure_ptr || (void*)ip1 > secure_ptr + secure_len) { printf ("Leaving .secure section... %d instructions executed

", cnt); break; } } ptrace (PTRACE_CONT, _pid, 0, 0); wait (&status); } printf ("DONE

"); exit (1); }

The function is a bit verbose but conceptually very simple. The XOR macro just applies the XOR encoding to a long (8 bytes) with a predefined key. You can check the details in the full source code (check at the end). At this point, it is not relevant.

If we check the format of the Intel opcodes, you will find out that a instruction, for a 64bits architecture may take up to 15 bytes. As we do not know (and we do not really want to know) anything about the next instruction to run, or in other words, we do not want to decode the opcodes ourselves in the code, then we have to decrypt up to 16 bytes to cover the longest possible opcode. Yes, in general, at a given time there are more than 1 single instruction decoded in memoru.

This is why we do two PTRACE_PEEK s to read the current address and that address plus 8 bytes (longs are 8 bytes long). Once we have read the 16 bytes that contains the next instruction to run, we just decrypt it applying our XOR macro, and we update the memory using PTRACE_POKE so, the next program instruction is now correct.

At this point we can just run the next instruction using PTRACE_SINGLESTEP and wait until the instruction is executed and control gets back to us.

Then we just need to encode again the 16 bytes of memory we decoded. This is not just to keep the program encrypted most of the time, but also to avoid some tricky logic to keep at least 16 bytes decoded in the memory program.

The final check in the while loop checks if the current IP is still in the .secure section of it has moved into other executable section… more on this at the end of the text.

Testing

Testing the program is not that straightforward. For the time being we have to do some manual tasks to make it work. It is not hard to fully automatise the process so I leave it as an exercise to you

First we have to compile the target program:

$ gcc -o target target.c stub.c

Then we have to crypt the .secure section, using the crypter_rt tool (provided with the code).

$ ./crypter_rt target

Finally we need to manually fix the section information in stub.c and redo the process. To get the information stub needs about the section just run this command:

$ readelf -S target | grep secure [14] .secure PROGBITS 0000000000400e22 00000e22

The two last number in this line has to go to the variables secure_ptr and secure_len in stub.c . This information is used to figure out when the execution leaves the secure section and matches not crypted code.

Recompile and rebuild and you should be able to run the program like this:

$ ./target 1425 0x00pf B3Crypt Stub (Byte By Byte) + Waiting for process... Leaving .secure section... 74 instructions executed Code is CORRECT DONE ./target 1426 0x00pf B3Crypt Stub (Byte By Byte) + Waiting for process... Leaving .secure section... 75 instructions executed Code is INCORRECT DONE

The Tricky part

This example is very simple on purpose. Making a usable version will require some effort and I do not really have a need for this tool so I do not think I will go further on this implementation. However these are a couple of things to do, in order to extend this PoC into a usable tool

The first thing to do is to extend the stub to access the ELF header and get the information associated to the .secure section, so you do not need to update the source code to re-compile

to access the ELF header and get the information associated to the section, so you do not need to update the source code to re-compile Second is more tricky. You have to detect jumps/calls to functions outside the .secure section, as for instance the standard C library (have you seen a single printf in the check_key function ?). In those cases we have to set a break point just after the call in order to run the function without decoding it (those functions are not encoded) and to restart the decoding when the function returns… The last check in the function may give you some hints on how to proceed.

section, as for instance the standard C library (have you seen a single in the function ?). In those cases we have to set a break point just after the call in order to run the function without decoding it (those functions are not encoded) and to restart the decoding when the function returns… The last check in the function may give you some hints on how to proceed. I haven’t extensively tested the program. I just made it work on my machine, it is not optimised and it may have some timing issues.

This only works on 64bits platforms… should be easy to make it work on 32bits… but in any case it is x86 specific. For ARM or MIPS you need to figure out how to set breakpoints.

Well, this is it. I think this proves the concept is feasible and it is up to you to make it work. I see this more as a SW protection mechanism than as a malware development technique… WTF… they are the same thing

As usual you can get the complete source code from my github repo

GitHub 0x00pf/0x00sec_code Code for my 0x00sec.org posts. Contribute to 0x00pf/0x00sec_code development by creating an account on GitHub.

Any comment is welcomed