A simple PIN tool unpacker for the Linux version of Skype

Some time ago I wanted to take a look to Skype to see how it works and get the classes diagram of this program but, surprise: It’s packed. The Windows version is protected with a crypter of their own, (UPDATE: this statement was wrong: the last time I checked it, was protected with Themida . It was Spotify the application protected with Themida). However, as I expected, the Linux version was simply packed (not protected) and with something easy to unpack. To unpack Skype and be able to analyse it in IDA and, also, to learn a bit how Intel PIN works, I have written a PIN tool to “automatically” unpack Skype.

Skype packer for Linux

The packer used in Skype is pretty straightforward to unpack and we don’t really need an unpacker for it: if we just want to analyse it in IDA Pro we can simply do the following:

Open it in IDA and let it finish the auto analysis. Put an “execute” hardware breakpoint at entry point. Execute it until the breakpoint is hit the 2nd time. Take a memory snapshot of the loader segments in IDA.

This is how it looks like before unpacking, right after the initial auto-analysis performed by IDA Pro:

And this is how it looks like after the hardware breakpoint is hit the 2nd time:

But, as previously stated, for learning a bit how Intel PIN works I decided to write a simple “write and exec” unpacker for Skype and connect IDA Pro with PIN via GDB server to take a memory snapshot when done. Also, it will be useful to unpack other simple packers, not just to unpack the Skype’s Linux binary.

Intel PIN

PIN is a binary instrumentation framework created by Intel for x86 and x86_64 that let us instrument code for any application written for those processors (in the past there was support for ARM and Itanium too, IIRC). Basically, it works by rewriting the real code the application executes inserting our instrumentation code at different granularities (instruction level, basic block level, etc…) A simple PIN tool looks like the following (extracted from the PIN example tool):

// Instruction count example // Actual instrumentation code VOID docount ( ) { icount ++ ; } // Code to check if we need to instrument an instruction VOID Instruction ( INS ins, VOID * v ) { // Insert a call to docount before every instruction, no arguments are passed INS_InsertCall ( ins, IPOINT_BEFORE, ( AFUNPTR ) docount, IARG_END ) ; } void Usage ( void ) { … } // PIN stuff and instrumentation initialization int main ( int argc, char * argv [ ] ) { // Initialize pin if ( PIN_Init ( argc, argv ) ) return Usage ( ) ; // Register Instruction to be called to instrument instructions INS_AddInstrumentFunction ( Instruction, ) ; // Start the program, never returns PIN_StartProgram ( ) ; return ; }

In main we initialize PIN stuff, setup instruction level instrumentation and executes the program (PIN_StartProgram). Then, for every new instruction discovered by PIN, the callback “Instruction” will be called. In this callback we decide what instructions we want to actually instrument by calling INS_InsertCall. Then, before the instruction is executed the callback “docount” will be executed. And that is, we have a working example to count the number of instructions a program executes.

GDB Server

In my opinion, one of the best features supported by Intel PIN is the “-appdebug” command line switch. This switch tells PIN to start a GDB server to debug the application. We can use this feature to debug from IDA Pro any application using PIN using the remote GDB debugger. The unique “problem” (not really a problem, just annoying) is that we cannot specify the port PIN will listen in as it will be randomly selected and we need to change it in Debugger -> Process Options every time we execute PIN. For example, let’s say we want to debug skype running the inscount0 example from IDA with the GDB server we would execute a command like the following:

$ pin -appdebug -t source/tools/ManualExamples/obj-ia32/inscount0.so -- `which skype`<br /> Application stopped until continued from debugger.<br /> Start GDB, then issue this command at the (gdb) prompt:<br /> target remote :12587<br />

And setup the remote GDB connection from IDA Pro using the specified port in the output of the command (Debugger -> Process Options):

After setting it up, click OK and select Debugger -> Attach to process from IDA. In the next dialog, just press OK when asked to which process we want to attach and that’s all, we are debugging the process with PIN from IDA.

A simple “write and exec” unpacker

Let’s go back to the main purpose of this post: writing an unpacker for Skype as a PIN tool. What I will do is to check if any instruction in the main binary (skype) modifies any of the application’s segments (for example, if it writes to the .text section), save them and, if the application jumps to execute code to any of the modified sections, raise an application breakpoint to inform the debugger the process seems to be unpacked. Is a pretty simple idea that works for simple packers, like the one used in Skype.

What I do in the PIN tool is, in the function main setup instrumentation granularity at trace level (basic block level) and install another callback that will be called right before the application starts:

//————————————————————————– int main ( int argc, char * argv [ ] ) { // Initialize PIN library. Print help message if -h(elp) is specified // in the command line or the command line is invalid if ( PIN_Init ( argc,argv ) ) return Usage ( ) ; // Register function to be called to instrument traces TRACE_AddInstrumentFunction ( trace_cb, ) ; // Register function to be called at application start time PIN_AddApplicationStartFunction ( app_start_cb, ) ; // Register function to be called when the application exits PIN_AddFiniFunction ( fini_cb, ) ; // Start the program, never returns PIN_StartProgram ( ) ; return ; }

In the “app_start_cb” function callback we will save the application’s segments in a std::map:

(…) struct segdata_t { size_t size; ADDRINT check; bool written; } ; typedef std :: map segmap_t; segmap_t seg_bytes; ( … ) //————————————————————————– static VOID app_start_cb ( VOID * v ) { IMG img = APP_ImgHead ( ) ; for ( SEC sec = IMG_SecHead ( img ) ; SEC_Valid ( sec ) ; sec = SEC_Next ( sec ) ) { ADDRINT sec_ea = SEC_Address ( sec ) ; // is the segment loaded in the process memory? if ( sec_ea != ) { ADDRINT check; // copy the first DWORD/QWORD to check if it was really changed size_t bytes = PIN_SafeCopy ( & amp;check, ( void * ) sec_ea, sizeof ( ADDRINT ) ) ; if ( bytes == sizeof ( ADDRINT ) ) { if ( min_ea & gt; sec_ea || min_ea == ) min_ea = sec_ea; if ( max_ea & lt; sec_ea || max_ea == ( unsigned ) -1 ) max_ea = sec_ea; segdata_t seg; seg. size = SEC_Size ( sec ) ; seg. check = check; seg. written = false ; // save the segment information seg_bytes [ sec_ea ] = seg; } } } }

We iterate over all the segments in the application that will be loaded in the process memory and save information about them. Now, in the “trace_cb” callback, we will check in every instruction of every basic block that is going to be executed if the code modifies memory in the limits of the previously recorded segments or if the process is going to execute an instruction in a previously written application’s segment:

//————————————————————————– static VOID trace_cb ( TRACE trace, VOID * v ) { // Visit every basic block in the trace for ( BBL bbl = TRACE_BblHead ( trace ) ; BBL_Valid ( bbl ) ; bbl = BBL_Next ( bbl ) ) { // Visit every instruction in the basic block for ( INS ins = BBL_InsHead ( bbl ) ; INS_Valid ( ins ) ; ins = INS_Next ( ins ) ) { // check if the address is in the limits of the application's segments ADDRINT ea = INS_Address ( ins ) ; if ( ! valid_ea ( ea ) ) continue ; // if that address was already written and is going to be executed, we consider it's unpacked if ( was_writen ( ea ) ) { INS_InsertCall ( ins, IPOINT_BEFORE, ( AFUNPTR ) check_unpacked_cb, IARG_INST_PTR, IARG_CONST_CONTEXT, IARG_THREAD_ID, IARG_END ) ; } // Instruments memory accesses using a predicated call, i.e. // the instrumentation is called iff the instruction will actually be executed. // // The IA-64 architecture has explicitly predicated instructions. // On the IA-32 and Intel(R) 64 architectures conditional moves and REP // prefixed instructions appear as predicated instructions in Pin. UINT32 mem_operands = INS_MemoryOperandCount ( ins ) ; // Iterate over each memory operand of the instruction. for ( UINT32 mem_op = ; mem_op & lt; mem_operands; mem_op ++ ) { // Note that in some architectures a single memory operand can be // both read and written (for instance incl (%eax) on IA-32) // In that case we instrument it once for read and once for write. if ( INS_MemoryOperandIsWritten ( ins, mem_op ) ) { // is the memory address to be modified in the limits of the application's segments? INS_InsertIfPredicatedCall ( ins, IPOINT_BEFORE, ( AFUNPTR ) valid_ea, IARG_MEMORYOP_EA, mem_op, IARG_END ) ; // if so, add our instrumentation code INS_InsertThenPredicatedCall ( ins, IPOINT_BEFORE, ( AFUNPTR ) record_mem_write_cb, IARG_INST_PTR, IARG_MEMORYOP_EA, mem_op, IARG_END ) ; } } } } }

In the “record_mem_write_cb” callback the PIN tool checks if the actual memory write affects any of the application’s segments. If so, the “written” flag of the corresponding segment element is set to true:

//————————————————————————– // Handle memory write records VOID record_mem_write_cb ( VOID * ip, VOID * addr ) { ADDRINT ea = ( ADDRINT ) addr; segmap_t :: iterator p; for ( p = seg_bytes. begin ( ) ; p != seg_bytes. end ( ) & amp; & amp; ! p -& gt;second. written ; ++ p ) { ADDRINT start_ea = p -& gt;first; if ( ea & gt; = start_ea ) { segdata_t * seg = & amp;p -& gt;second; if ( ea size ) { fprintf ( stderr, "%p: W %p

" , ip, addr ) ; write_address. push_back ( ( ADDRINT ) addr ) ; seg -& gt;written = true ; break ; } } } }

And, finally, in the callback “check_unpacked_cb” that we installed in the “trace_cb” callback, we set again the “written” member to false and raise an application breakpoint that will be catch in IDA Pro:

//————————————————————————– VOID check_unpacked_cb ( VOID * ip, const CONTEXT * ctxt, THREADID tid ) { ADDRINT ea = ( ADDRINT ) ip; addrdeq_t :: iterator it = std :: find ( write_address. begin ( ) , write_address. end ( ) , ea ) ; if ( it != write_address. end ( ) ) write_address. erase ( it ) ; fprintf ( stderr, "Layer unpacked: %p

" , ip ) ; PIN_ApplicationBreakpoint ( ctxt, tid, false , "Layer unpacked!" ) ; }

OK, we have our simple unpacker, it’s time to compile it, execute this PIN tool with the -appdebug command line switch, connect from IDA to PIN and let the application run. When the breakpoint is hit, the application (Skype in this case) is unpacked and we can take a memory snapshot. In the terminal where we execute the command we will see something like this:

<br /> $ ./pin -appdebug -t source/tools/MyPinTool/obj-ia32/pinpack.so -- /path/to/skype<br /> Application stopped until continued from debugger.<br /> Start GDB, then issue this command at the (gdb) prompt:<br /> target remote :47643<br /> 0x83d95b9: W 0x840c35f<br /> 0x840bc5e: W 0x805c050<br /> 0x840bd3f: W 0x8058ed0<br /> Layer unpacked: 0x805c050<br />

And in IDA we will receive an application breakpoint at the entry point with the message “Layer unpacked” displayed in the output window:

And that’s all! We have a working “write and exec” unpacker in the form of a PIN tool. You can download the source code of the unpacker here.

Extra

What I really wanted to do before writing the PIN tool was to get a classes diagram of the Skype application. Now that the application is unpacked in IDA we can easily do it (after taking a memory snapshot and re-analysing the whole database). I’ll use the scripts written by Igor Skochinsky released after his RECON conference “Compiler Internals: Exceptions and RTTI”. I modified the script gnu_rtti.py a little to display a classes diagram in a GraphViewer component in IDA (instead of a chooser) that, also, let’s you save the diagram in dot format. You can download my modified version of the script here.

After running this script (go grab a coffee if you do it yourself as it will take a while) the classes diagram will be displayed in the GraphViewer component and we can right click in the graph and select “Export to dot”. The following is the generated classes diagram of Skype rendered with GraphViz:

That’s all! I hope you liked this blog post!