Tomasz Grysztar







Joined: 16 Jun 2003

Posts: 7782

Location: Kraków, Poland

Tomasz Grysztar Chapter 1

PE (Portable Executable)



The road we are going to take is to learn inner workings of file formats by constructing some files from scratch. This approach is focused on experimentation, so we will use samples designed in a way that encourages playing with them and learning through direct experience.



The first file we construct is going to be an executable for Windows operating system, in the format called Portable Executable.







1.1 Building a simple program



Before we go on, a few preparations. We should take the ALIGN macro we discussed earlier, it is going to become useful quite soon. We may also need to create some machine code for the actual program inside our executable and for this we need to include an instruction set for a processor architecture we need to work with. Our first choice is going to be very traditional, the 32-bit x86 architecture, so we include an instruction set for processors compatible with 80386: Code: include '80386.inc' use32



A term that often pops up when discussing PE files is the program image. This refers to the layout of the program after it is loaded into memory to be executed, which is not necessarily the same as the structure of the program in the file on disk. The executable needs to define a mapping of sections of the file onto the corresponding areas in memory.



Nevertheless, both the file on disk and image in memory start the same way - with the headers. These structures from the beginning of file become the initial portion of loaded program, at the address called the base of the image. All the other sections created in memory have to be placed after that.



Any PE executable is constructed with an assumed value for the base of the image, for 32-bit programs this is usually 0x400000. We are going to define a constant with this value and use it as the base for our labels: Code: IMAGE_BASE := 0x400000 org IMAGE_BASE



The next two constants choose the alignment settings for the disk and for the memory. This is one of the sources of discrepancy between the layouts of the file and of the image. Code: FILE_ALIGNMENT := 512 SECTION_ALIGNMENT := 4096



In memory, the sections are aligned to the size of page (which is 4096 bytes in the basic setup of x86 CPU). This is partly because memory can be allocated only in such increments, but also because different sections may require distinct attributes for the memory (like write-protection) and CPU can have them set up only for entire pages at once.



These constants are better left with the standard values. While it is possible to tweak them in such way that it should still be possible for the operating system to construct the image, the loader may distrust and refuse to load an executable with a non-standard layout. There are also some additional constraints if chosen alignment in memory is smaller than the size of page (we may get back to it later).



It is time to start writing the headers. The very first bytes of the file are usually an unique signature of the format, but in the case of PE a matter is a bit more complicated. At the time when PE format was designed DOS was still a popular operating system and many of the new formats - like NE (16-bit format used by the earliest versions of Windows), LE (used by OS/2, but also by drivers in Windows 9x) and finally PE - were based on the old MZ format used for the .EXE files in DOS. All these formats were made in such way that the initial portion of the file was a valid MZ program that could be executed by DOS, usually it was a tiny program that just displayed a message like "This program cannot be run in DOS mode". This small program was called a stub and its MZ header was extended to contain a special field, ignored by older software, containing the offset of the actual new executable header later in the file.



This way it was even possible to have an executable that would contain two versions of the same software - one for DOS and one for Windows. This was not an usual thing to do, though. Mostly, the stub programs were just informing in one way or the other that the file was not intended to be run from DOS.



Nowadays we do not need to worry much about someone mistakenly trying to execute our PE file in DOS, therefore we are going to make a minimal stub - not a real program, just something that resembles one enough for our PE executable to be valid: Code: Stub : .Signature dw "MZ" .BytesInLastSector dw SIZE_OF_STUB mod 512 .NumberOfSectors dw ( SIZE_OF_STUB - 1 )/ 512 + 1 .NumberOfRelocations dw 0 .NumberOfHeaderParagraphs dw SIZE_OF_STUB_HEADER / 16 db 0x3C - ( $ - Stub ) dup 0 .NewHeaderOffset dd Header - IMAGE_BASE align 16 SIZE_OF_STUB_HEADER := $ - Stub ; The code of a DOS program would go here. SIZE_OF_STUB := $ - Stub



We compute the offset of a main PE header by subtracting IMAGE_BASE from its address (available through a label that we are going to define below). For all the headers there is such clear correspondence between addresses in image and positions in file.



We also fill a couple of fields in the MZ header that are crucial for its integrity, namely the size of the header and of the entire program. The header is measured in 16-byte units (in DOS they were called paragraphs) and the "align 16" is there to make sure that this is a multiple of 16 (though in this case nothing needs to be done, the position immediately after the NewHeaderOffset is 64). The size of DOS program is given as a count of 512-byte sectors, but the last one of them is allowed to be not fully filled and BytesInLastSector gives the number of bytes in it.



On a side note, when a label starts with a dot, it belongs to the namespace of a regular label that preceded it. The labels defined here could be accessed from elsewhere with identifiers like "Stub.Signature" or "Stub.NewHeaderOffset".



With the stub ready, we can move on to the main header, this is where the actual PE signature is going to be. This header must be aligned on 8-byte boundary, hence we put an "align 8" here, though it again does nothing (but if we had put a real DOS program above, the position in file might have been misaligned).

Code: align 8 Header : .Signature dw "PE" , 0 .Machine dw 0x14C ; IMAGE_FILE_MACHINE_I386 .NumberOfSections dw NUMBER_OF_SECTIONS .TimeDateStamp dd %t .PointerToSymbolTable dd 0 .NumberOfSymbols dd 0 .SizeOfOptionalHeader dw SectionTable - OptionalHeader .Characteristics dw 0x102 ; IMAGE_FILE_32BIT_MACHINE + IMAGE_FILE_EXECUTABLE_IMAGE



According to the plan, the first example is going to be for a 32-bit mode of a x86 CPU and we state this in the Machine field, but also by including IMAGE_FILE_32BIT_MACHINE value in the Characteristics. The latter field is a set of flags and there is another one that we unquestionably need there - IMAGE_FILE_EXECUTABLE_IMAGE tells that the file contains an executable code.



PE is closely related to COFF, which is a format of object files that are created by compilers as an intermediate stage before they are finally linked to create code that can be executed. These two formats have mostly identical headers (except for the PE signature, which is missing in COFF) and they share the values of various constants. The value of IMAGE_FILE_EXECUTABLE_IMAGE has been used by COFF to distinguish the object files from the executable ones (when we later talk about ELF format, which superseded COFF on the Unix systems, we are going to see that it has similar variants).



In NumberOfSections we need to state how many sections do we plan to create. We do not know that yet, but we can use the name of a constant that we define later with the right value.



TimeDateStamp needs to tell when the file was created, in the "milliseconds since Unix epoch" format. A special symbol %t is provided by fasmg with such value.



PointerToSymbolTable and NumberOfSymbols are another relic of the COFF format. They are not used in PE and we just fill them with zeros.



After the main header comes the so-called "optional header". This name is also a legacy of COFF, as this structure contains a crucial information about the entry point of an executable code and is definitely required for any PE image. It was only optional in COFF, when the file could be an intermediate object, not yet made into an executable.



The optional header follows immediately after the main one and is in turn followed by the section table. Thus to obtain the size that we need to put in SizeOfOptionalHeader we just compute the difference between the OptionalHeader and SectionTable addresses.



Code: OptionalHeader : .Magic dw 0x10B .MajorLinkerVersion db 0 .MinorLinkerVersion db 0 .SizeOfCode dd 0 .SizeOfInitializedData dd 0 .SizeOfUninitializedData dd 0 .AddressOfEntryPoint dd EntryPoint - IMAGE_BASE .BaseOfCode dd 0 .BaseOfData dd 0



The value of Magic identifies a variant of PE format. For classic 32-bit PE it is always 0x10B (a ZMAGIC value which COFF inherited from the old a.out format); while 0x20B is used to mark PE+ files, a variety intended mainly for 64-bit architectures. They slightly differ in format of the structures that follow, we are going to look at these differences later, when we create a 64-bit executable.



Of the other fields in this initial portion of the "optional" header the only important one is AddressOfEntryPoint, which should contain an address of entry point relative to the base of the image. The specification calls this kind of value an RVA (Relative Virtual Address), while VA (Virtual Address) is just a direct address in memory. To compute an RVA we simply subtract IMAGE_BASE from the address (VA). The EntryPoint label is going to be defined later, in the code of our program.



MajorLinkerVersion and MinorLinkerVersion are filled by a linker when it creates the executable, this allows the linker to put some mark of authorship on the executable. We are not a linker, so we can decide for ourselves what kind of mark to leave there. A simple choice is just zeros.



The other fields, like SizeOfCode and AddressOfCode, are remnants of the original COFF model (which in turn inherited them from the old a.out) and they do not really matter to PE loader. Various kinds of code and data sections may be intermixed within the image and the true authority on their sizes and placement is held by the section table. The fields here are just a supplementary information and, for instance, if there were several sections of data with some code in-between, the sum of their sizes would serve only a statistical role.



If we wanted to be pedantic about it, we could fill these fields with values copied from our section table, but for now we just leave them zeroed. An additional sign of the irrelevancy of these numbers is that in PE+ the entire BaseOfData field is readily sacrificed to allow the subsequent ImageBase field to be enlarged to 64-bit without moving the later ones.



Code: .ImageBase dd IMAGE_BASE .SectionAlignment dd SECTION_ALIGNMENT .FileAlignment dd FILE_ALIGNMENT .MajorOperatingSystemVersion dw 3 .MinorOperatingSystemVersion dw 10 .MajorImageVersion dw 0 .MinorImageVersion dw 0 .MajorSubsystemVersion dw 3 .MinorSubsystemVersion dw 10 .Win32VersionValue dd 0 .SizeOfImage dd SIZE_OF_IMAGE .SizeOfHeaders dd SIZE_OF_HEADERS .CheckSum dd 0 .Subsystem dw 2 ; IMAGE_SUBSYSTEM_WINDOWS_GUI .DllCharacteristics dw 0 .SizeOfStackReserve dd 4096 .SizeOfStackCommit dd 4096 .SizeOfHeapReserve dd 65536 .SizeOfHeapCommit dd 0 .LoaderFlags dd 0 .NumberOfRvaAndSizes dd NUMBER_OF_RVA_AND_SIZES

In contrast, this part of headers holds many important values. All the constants we defined earlier - the base of the image and the alignment values - are stored here exactly as they are. We also use two constants we have not yet defined to fill SizeOfImage and SizeOfHeaders, we are going to calculate these values later.



MajorOperatingSystemVersion together with MinorOperatingSystemVersion as well as MajorSubsystemVersion with MinorSubsystemVersion declare what version of operating system is needed to execute the image. Programs created for older versions are allowed to run on the newer ones, and this example is not going to use any features that were not in Windows since the beginning, so to not unnecessarily limit the execution of program we put 3.10 there (this is the version number of first Windows NT that supported PE format).



MajorImageVersion and MinorImageVersion could indicate the version of our program, but they are usually unused. And Win32VersionValue is just a reserved field, with currently unknown purpose; it needs to be zero. The same goes for LoaderFlags further below.



CheckSum is a value computed over all the bytes of the executable that can be used to check whether the file has been modified in any way since the time when it was calculated. Normal programs are not required to have a valid checksum, so in this example we are going to skip this step. But even when we plan to compute the checksum, the value of this field should not partake in the summation so it is better to have it initially zeroed.



Subsystem identifies the environment where the program wants to be run. For normal applications this is either GUI or console.



DllCharacteristics is an additional set of flags supplementary to Characteristics in the main header. This is another case of a misnomer, the flags here are not necessarily related to whether the file is a DLL. Nevertheless, at the moment we do not need to set any of them.



SizeOfStackReserve and SizeOfStackCommit set up the size of stack for our program, the former states how large the stack is allowed to become, while the latter determines the initial size. We go with a single page for both. SizeOfHeapReserve and SizeOfHeapCommit provide similar settings for the local heap, which is a pool from which program may allocate small blocks of memory whenever needed. We set up some usual values, though we are not going to use heap in our simple program.



Finally, NumberOfRvaAndSizes specifies how many pairs consisting of a relative address and a size follow immediately after. This forms a sort of catalogue of specialized data structures present in the image. They come in a fixed order, as folows:

Code: RvaAndSizes : .Export.Rva dd 0 .Export.Size dd 0 .Import.Rva dd ImportTable - IMAGE_BASE .Import.Size dd ImportTable.End - ImportTable .Resource.Rva dd 0 .Resource.Size dd 0 .Exception.Rva dd 0 .Exception.Size dd 0 .Certificate.Rva dd 0 .Certificate.Size dd 0 .BaseRelocation.Rva dd 0 .BaseRelocation.Size dd 0 .Debug.Rva dd 0 .Debug.Size dd 0 .Architecture.Rva dd 0 .Architecture.Size dd 0 .GlobalPtr.Rva dd 0 .GlobalPtr.Size dd 0 .TLS.Rva dd 0 .TLS.Size dd 0 .LoadConfig.Rva dd 0 .LoadConfig.Size dd 0 .BoundImport.Rva dd 0 .BoundImport.Size dd 0 .IAT.Rva dd 0 .IAT.Size dd 0 .DelayImport.Rva dd 0 .DelayImport.Size dd 0 .COMPlus.Rva dd 0 .COMPlus.Size dd 0 .Reserved.Rva dd 0 .Reserved.Size dd 0



Here the optional header ends, immediately followed by the section table - a crucial component of the headers.

Code: SectionTable : .1.Name dq + '.text' .1.VirtualSize dd Section.1.End - Section.1 .1.VirtualAddress dd Section.1 - IMAGE_BASE .1.SizeOfRawData dd Section.1.SIZE_IN_FILE .1.PointerToRawData dd Section.1.OFFSET_IN_FILE .1.PointerToRelocations dd 0 .1.PointerToLineNumbers dd 0 .1.NumberOfRelocations dw 0 .1.NumberOfLineNumbers dw 0 .1.Characteristics dd 0x60000000 ; IMAGE_SCN_MEM_EXECUTE + IMAGE_SCN_MEM_READ .2.Name dq + '.rdata' .2.VirtualSize dd Section.2.End - Section.2 .2.VirtualAddress dd Section.2 - IMAGE_BASE .2.SizeOfRawData dd Section.2.SIZE_IN_FILE .2.PointerToRawData dd Section.2.OFFSET_IN_FILE .2.PointerToRelocations dd 0 .2.PointerToLineNumbers dd 0 .2.NumberOfRelocations dw 0 .2.NumberOfLineNumbers dw 0 .2.Characteristics dd 0x40000000 ; IMAGE_SCN_MEM_READ SectionTable.End :



The name of the section is stored in an 8-byte field, padded with zeros. We use DQ to define this as a 64-bit value and convert the string to a number with the + operator, in order to enable range check. A DQ with a string argument would allow text of any length and it would simply pad it so that the size was a multiple of 8 bytes. By converting text to a number we ensure that it has to fit in a single 64-bit cell so the field is always exactly 8 bytes long.



VirtualAddress and VirtualSize define the boundaries of a section within the image in memory. The starting address needs to be set up consistently with the SectionAlignment, we need to keep this in mind later when we define the labels used here.



PointerToRawData and SizeOfRawData define the placement of the contents of a section within the file. Both values have to be aligned accordingly to the FileAlignment, so it is possible for section's data in file to be larger than the size of that section in memory. It can also be the other way around, since a section may reserve more memory than it contains actual data. In an extreme case the size in file might be 0 when a section contains nothing but reserved memory. We are going to compute the constants used there with help of the $% symbol, after ensuring the proper alignment within the file.



The fields that refer to relocations and line numbers are in these structures because COFF objects use them, but for PE images they should be zeroed. Although PE could contain some relocations, they would be very different from the ones used by COFF and defined elsewhere (we are going to discuss them a bit later, the first example can work without them).



Characteristics contain various flags, here we just mark both sections as a readable memory and the code section as executable. These settings translate directly into the attributes of allocated memory pages, so they are quite important. We could also use values like IMAGE_SCN_CNT_CODE and IMAGE_SCN_CNT_INITIALIZED_DATA (connected to the fields like SizeOfCode and SizeOfInitializedData in the main header), but this would mostly be just decorative.



The end of the section table is also the end of the contents of the headers. Before we go further, we are going to fill up a few of the related constants. They are a bit redundant, the effect would be the same if we plugged the corresponding expressions directly in the places where we used their names earlier. But the use of middlemen constants helps to comfortably alter the way they are computed when this comes up in the future. Code: NUMBER_OF_RVA_AND_SIZES := ( SectionTable - RvaAndSizes )/ 8 NUMBER_OF_SECTIONS := ( SectionTable.End - SectionTable )/ 40 SIZE_OF_HEADERS := Section.1.OFFSET_IN_FILE



As for the total size of headers, it has to be rounded up to the nearest multiple of FILE_ALIGNMENT, and this is at the same time the position where the contents of the initial section is going to begin. Therefore we can cheat a little and shift the responsibility to another constant, the one defining the offset in file for the first section.



However, to correctly position our initial section we need to do some actual work. Code: align SECTION_ALIGNMENT Section.1 :

Code: section $%% align FILE_ALIGNMENT , 0 Section.1.OFFSET_IN_FILE :



With the use of $%% as an argument to SECTION we temporarily switch from in-memory addressing to one tracing the actual position in file. This makes the address $ equal to the offset $% until we change this with another SECTION or ORG.



After that we use the alignment macro once more, this time to align the offset in file to the nearest multiple of FILE_ALIGNMENT. While the previous alignment just moved our address in memory without adding anything to file, this time we provide the second argument to the macro to make it write the necessary amount of zeroed bytes to the output.



Then Section.1.OFFSET_IN_FILE can be defined simply as a label, thanks to the address being the same as the position in file.



Finally we switch back to in-memory addressing, at the address of Section.1 label. A simple ORG would suffice, but we use SECTION for the visual appeal: Code: section Section.1 EntryPoint : push 0 push CaptionString push MessageString push 0 call [ MessageBoxA ] push 0 call [ ExitProcess ] Section.1.End :



Now we need to perform the full alignment ritual again, this time to set up the position of the second section. We also calculate the size of the first one in file simply by computing the difference between the aligned offsets. Code: align SECTION_ALIGNMENT Section.2 : section $%% align FILE_ALIGNMENT , 0 Section.1.SIZE_IN_FILE := $ - Section.1.OFFSET_IN_FILE Section.2.OFFSET_IN_FILE :

Code: section Section.2

We start with the import table, which allows us to direct the loader to fill up our pointers with the addresses of the functions from system DLL files. This is actually a complex structure that consist of several smaller tables. First, there is an Import Directory Table.

Code: ImportTable : .1.ImportLookupTableRva dd KernelLookupTable - IMAGE_BASE .1.TimeDateStamp dd 0 .1.ForwarderChain dd 0 .1.NameRva dd KernelDLLName - IMAGE_BASE .1.ImportAddressTableRva dd KernelAddressTable - IMAGE_BASE .2.ImportLookupTableRva dd UserLookupTable - IMAGE_BASE .2.TimeDateStamp dd 0 .2.ForwarderChain dd 0 .2.NameRva dd UserDLLName - IMAGE_BASE .2.ImportAddressTableRva dd UserAddressTable - IMAGE_BASE dd 0 , 0 , 0 , 0 , 0



NameRva is a relative address of the name of DLL file. We are going to put these names near the end of the import-related data.



ImportLookupTableRva and ImportAddressTableRva point to two parallel tables. The former contains relative addresses of structures declaring functions to be imported, while the latter is going to contain actual addresses of imported functions. The functions can be in any order, as long as the same one is used for both tables. When our image is loaded into memory, the operating system is going to look for all the functions defined by the first table and fill the second one with corresponding addresses.



TimeDateStamp and ForwarderChain fields are used when the imports are bound - that is, when the second table is pre-filled with addresses of imported functions to save time when loading the image. This obviously can work correctly only when all the addresses in imported library are exactly as they were upon binding, and TimeDateStamp keeps the value of the timestamp of the DLL to provide a way to verify that it is exactly the same file. If the timestamps match, the loader can skip looking up all the functions, otherwise it does it as usual. Our imports are not bound, we need the loader to fill the addresses for us, therefore we keep TimeDateStamp zeroed in every case.



If the imports were bound, ForwarderChain would be interpreted as an index of a function that could not be bound because it was a forwarded import from another DLL. The value of the corresponding entry in the import address table would be an index of another such function, and so on. If we wanted to indicate that there were no such functions, we should put -1 in this field, but since we do not use binding (as indicated by the zeroed TimeDateStamp) this value is irrelevant.



Now we need to create lookup tables and address tables for every DLL. The initial contents of the parallel tables should be the same, they both should contain relative addresses to the lookup entries defining the functions. When the image is loaded, the IAT is rewritten with the matching addresses. We can then use these values directly, therefore we label them with names of the functions and this is exactly what is needed to get the CALL instructions in our code to work.

Code: KernelLookupTable : dd ExitProcessLookup - IMAGE_BASE dd 0 KernelAddressTable : ExitProcess dd ExitProcessLookup - IMAGE_BASE ; this is going to be replaced with the address of the function dd 0 UserLookupTable : dd MessageBoxALookup - IMAGE_BASE dd 0 UserAddressTable : MessageBoxA dd MessageBoxALookup - IMAGE_BASE ; this is going to be replaced with the address of the function dd 0

We import only one function from each DLL, so the tables are short. The end of a table is marked by a zeroed entry.



Next come the lookup definitions for individual functions. Each such structure contains a 16-bit hint followed by the name of the function as a null-terminated string. The hint is an index into the export table of DLL, where the loader may look for the function with such name. If the hint fails, the loader continues to search for the function as usual, thus we do not have to know the right values to put there.

Code: ExitProcessLookup : .Hint dw 0 .Name db 'ExitProcess' , 0 align 2 MessageBoxALookup : .Hint dw 0 .Name db 'MessageBoxA' , 0



Finally, we conclude the import table with the names of DLL files that we import. They are a plain null-terminated strings.

Code: KernelDLLName db 'KERNEL32.DLL' , 0 UserDLLName db 'USER32.DLL' , 0 ImportTable.End :

Code: CaptionString db "PE tutorial" , 0 MessageString db "I am alive and well!" , 0 Section.2.End :

Code: align SECTION_ALIGNMENT SIZE_OF_IMAGE := $ - IMAGE_BASE section $%% align FILE_ALIGNMENT , 0 Section.2.SIZE_IN_FILE := $ - Section.2.OFFSET_IN_FILE



This is it, the source for our first PE image is ready (a copy is in the attached "basic.asm" file). We can now assemble it into a file with the "exe" extension and let it run.



We can also combine it with the "listing.inc" script to contemplate the binary data juxtaposed with the commands that generated it. You may notice that numerous lines from "80386.inc" show up in the listing. To get rid of them, we can hide the included file inside a simple macro: Code: macro use? file * include file end macro use '80386.inc' use32 Code: use 'ntimage.inc' Code: .Characteristics dw IMAGE_FILE_32BIT_MACHINE + IMAGE_FILE_EXECUTABLE_IMAGE Code: .DllCharacteristics dw IMAGE_DLLCHARACTERISTICS_NX_COMPAT



It was a first step towards making our source more maintainable. Another one could be to automate some of the tasks. For example, we can generate all the entries in the section table with a simple repetition:

Code: SectionTable : repeat NUMBER_OF_SECTIONS , n : 1 .n.Name dq Section.n.NAME .n.VirtualSize dd Section.n.End - Section.n .n.VirtualAddress dd Section.n - IMAGE_BASE .n.SizeOfRawData dd Section.n.SIZE_IN_FILE .n.PointerToRawData dd Section.n.OFFSET_IN_FILE .n.PointerToRelocations dd 0 .n.PointerToLineNumbers dd 0 .n.NumberOfRelocations dw 0 .n.NumberOfLineNumbers dw 0 .n.Characteristics dd Section.n.CHARACTERISTICS end repeat SectionTable.End :



This approach requires that we define several more constants. We also have to change how the NUMBER_OF_SECTIONS is defined, we can no longer compute it from the size of the section table, as this would create a circular dependence: Code: NUMBER_OF_SECTIONS := 2 Section.1.NAME := + '.text' Section.1.CHARACTERISTICS := IMAGE_SCN_MEM_EXECUTE + IMAGE_SCN_MEM_READ Section.2.NAME := + '.rdata' Section.2.CHARACTERISTICS := IMAGE_SCN_MEM_READ Code: CURRENT_SECTION = 0 macro section? name *, characteristics : 0 CURRENT_SECTION = CURRENT_SECTION + 1 repeat 1 , new : CURRENT_SECTION , previous : CURRENT_SECTION - 1 Section.previous.End : align SECTION_ALIGNMENT Section.new.NAME := + name Section.new.CHARACTERISTICS := characteristics Section.new : section $%% align FILE_ALIGNMENT , 0 if previous > 0 Section.previous.SIZE_IN_FILE := $ - Section.previous.OFFSET_IN_FILE end if Section.new.OFFSET_IN_FILE : org Section.new end repeat end macro



To define labels and constants that correspond to enumerated section entries, we need to extract the number from the CURRENT_SECTION variable and somehow place it into names. The trick in fasmg is to use REPEAT with just a single repetition, solely for the purpose of defining counters that get replaced with numbers before the repeated text is assembled.



The macro does everything that we have previously done manually when starting a new section. The ending address and the size in file get defined only when the next section is started, so we need to define an additional false (not counted into the total number) section at the end, together with the definition of the NUMBER_OF_SECTIONS and the SIZE_OF_IMAGE. Code: postpone NUMBER_OF_SECTIONS := CURRENT_SECTION section '' SIZE_OF_IMAGE := $ - IMAGE_BASE end postpone



This macro required us to learn a bit more of the assembler's trickery, but it makes the section definitions much more pleasant to the eye: Code: section '.text' , IMAGE_SCN_MEM_EXECUTE + IMAGE_SCN_MEM_READ EntryPoint : push 0 push CaptionString push MessageString push 0 call [ MessageBoxA ] push 0 call [ ExitProcess ] section '.rdata' , IMAGE_SCN_MEM_READ ImportTable : .1.ImportLookupTableRva dd KernelLookupTable - IMAGE_BASE .1.TimeDateStamp dd 0 .1.ForwarderChain dd 0 .1.NameRva dd KernelDLLName - IMAGE_BASE .1.ImportAddressTableRva dd KernelAddressTable - IMAGE_BASE .2.ImportLookupTableRva dd UserLookupTable - IMAGE_BASE .2.TimeDateStamp dd 0 .2.ForwarderChain dd 0 .2.NameRva dd UserDLLName - IMAGE_BASE .2.ImportAddressTableRva dd UserAddressTable - IMAGE_BASE dd 0 , 0 , 0 , 0 , 0 KernelLookupTable : dd ExitProcessLookup - IMAGE_BASE dd 0 KernelAddressTable : ExitProcess dd ExitProcessLookup - IMAGE_BASE ; this is going to be replaced with the address of the function dd 0 UserLookupTable : dd MessageBoxALookup - IMAGE_BASE dd 0 UserAddressTable : MessageBoxA dd MessageBoxALookup - IMAGE_BASE ; this is going to be replaced with the address of the function dd 0 align 2 ExitProcessLookup : .Hint dw 0 .Name db 'ExitProcess' , 0 align 2 MessageBoxALookup : .Hint dw 0 .Name db 'MessageBoxA' , 0 KernelDLLName db 'KERNEL32.DLL' , 0 UserDLLName db 'USER32.DLL' , 0 ImportTable.End : CaptionString db "PE tutorial" , 0 MessageString db "I am alive and well!" , 0

Code: iterate name , Export , Import , Resource , Exception , Certificate , BaseRelocation , Debug , Architecture , GlobalPtr , TLS , LoadConfig , BoundImport , IAT , DelayImport , COMPlus , Reserved if defined name # Table .name.Rva dd name # Table - IMAGE_BASE .name.Size dd name # Table.End - name # Table else .name.Rva dd 0 .name.Size dd 0 end if end iterate



A variant of the first source that has all these improvements is in the attached "basic_template.asm" file. We are going to use it as a base for the continued experiments. The road we are going to take is to learn inner workings of file formats by constructing some files from scratch. This approach is focused on experimentation, so we will use samples designed in a way that encourages playing with them and learning through direct experience.The first file we construct is going to be an executable for Windows operating system, in the format called Portable Executable. PE was designed in 1993 for Windows NT (the first 32-bit system in the family), and has been used from then on by the 32-bit and 64-bit implementations of Windows. Subsequently it has been adopted for some other uses, like EFI, but at this time we are going to focus on its original environment.Before we go on, a few preparations. We should take the ALIGN macro we discussed earlier, it is going to become useful quite soon. We may also need to create some machine code for the actual program inside our executable and for this we need to include an instruction set for a processor architecture we need to work with. Our first choice is going to be very traditional, the 32-bit x86 architecture, so we include an instruction set for processors compatible with 80386:USE32 is a command provided by the '80386.inc' package, it chooses to assemble instructions for 32-bit mode (if we did not specify it, the default mode would be 16-bit, for historical reasons).A term that often pops up when discussing PE files is the program image. This refers to the layout of the program after it is loaded into memory to be executed, which is not necessarily the same as the structure of the program in the file on disk. The executable needs to define a mapping of sections of the file onto the corresponding areas in memory.Nevertheless, both the file on disk and image in memory start the same way - with the headers. These structures from the beginning of file become the initial portion of loaded program, at the address called the base of the image. All the other sections created in memory have to be placed after that.Any PE executable is constructed with an assumed value for the base of the image, for 32-bit programs this is usually 0x400000. We are going to define a constant with this value and use it as the base for our labels:Therefore all the labels that we define are going to correspond to addresses in the program image.The next two constants choose the alignment settings for the disk and for the memory. This is one of the sources of discrepancy between the layouts of the file and of the image.The standard choice of file alignment makes sure that every section in the file starts on a new sector of the disk (traditionally hard drives have a sector size of 512 bytes), to optimize the performance of reading and mapping a single section into memory.In memory, the sections are aligned to the size of page (which is 4096 bytes in the basic setup of x86 CPU). This is partly because memory can be allocated only in such increments, but also because different sections may require distinct attributes for the memory (like write-protection) and CPU can have them set up only for entire pages at once.These constants are better left with the standard values. While it is possible to tweak them in such way that it should still be possible for the operating system to construct the image, the loader may distrust and refuse to load an executable with a non-standard layout. There are also some additional constraints if chosen alignment in memory is smaller than the size of page (we may get back to it later).It is time to start writing the headers. The very first bytes of the file are usually an unique signature of the format, but in the case of PE a matter is a bit more complicated. At the time when PE format was designed DOS was still a popular operating system and many of the new formats - like NE (16-bit format used by the earliest versions of Windows), LE (used by OS/2, but also by drivers in Windows 9x) and finally PE - were based on the old MZ format used for the .EXE files in DOS. All these formats were made in such way that the initial portion of the file was a valid MZ program that could be executed by DOS, usually it was a tiny program that just displayed a message like "This program cannot be run in DOS mode". This small program was called a stub and its MZ header was extended to contain a special field, ignored by older software, containing the offset of the actual new executable header later in the file.This way it was even possible to have an executable that would contain two versions of the same software - one for DOS and one for Windows. This was not an usual thing to do, though. Mostly, the stub programs were just informing in one way or the other that the file was not intended to be run from DOS.Nowadays we do not need to worry much about someone mistakenly trying to execute our PE file in DOS, therefore we are going to make a minimal stub - not a real program, just something that resembles one enough for our PE executable to be valid:What is important here is that at the position 0x3C from the beginning of MZ header there should be a 32-bit field containing the offset to actual PE header. We fill most of the MZ header with zeros up to that point, normally there are some fields important for the MZ format, but we do not intend to make a functional DOS program.We compute the offset of a main PE header by subtracting IMAGE_BASE from its address (available through a label that we are going to define below). For all the headers there is such clear correspondence between addresses in image and positions in file.We also fill a couple of fields in the MZ header that are crucial for its integrity, namely the size of the header and of the entire program. The header is measured in 16-byte units (in DOS they were called paragraphs) and the "align 16" is there to make sure that this is a multiple of 16 (though in this case nothing needs to be done, the position immediately after the NewHeaderOffset is 64). The size of DOS program is given as a count of 512-byte sectors, but the last one of them is allowed to be not fully filled and BytesInLastSector gives the number of bytes in it.On a side note, when a label starts with a dot, it belongs to the namespace of a regular label that preceded it. The labels defined here could be accessed from elsewhere with identifiers like "Stub.Signature" or "Stub.NewHeaderOffset".With the stub ready, we can move on to the main header, this is where the actual PE signature is going to be. This header must be aligned on 8-byte boundary, hence we put an "align 8" here, though it again does nothing (but if we had put a real DOS program above, the position in file might have been misaligned).There are some constants used here that are given names in the official specifications of PE format. To make the generated data more tangible in the first demonstration, we use their values directly and leave the names in the comments. But as we continue to work with these examples, later we may prefer to include an additional header into our script with the definitions of all these constants and just use the names.According to the plan, the first example is going to be for a 32-bit mode of a x86 CPU and we state this in the Machine field, but also by including IMAGE_FILE_32BIT_MACHINE value in the Characteristics. The latter field is a set of flags and there is another one that we unquestionably need there - IMAGE_FILE_EXECUTABLE_IMAGE tells that the file contains an executable code.PE is closely related to COFF, which is a format of object files that are created by compilers as an intermediate stage before they are finally linked to create code that can be executed. These two formats have mostly identical headers (except for the PE signature, which is missing in COFF) and they share the values of various constants. The value of IMAGE_FILE_EXECUTABLE_IMAGE has been used by COFF to distinguish the object files from the executable ones (when we later talk about ELF format, which superseded COFF on the Unix systems, we are going to see that it has similar variants).In NumberOfSections we need to state how many sections do we plan to create. We do not know that yet, but we can use the name of a constant that we define later with the right value.TimeDateStamp needs to tell when the file was created, in the "milliseconds since Unix epoch" format. A special symbol %t is provided by fasmg with such value.PointerToSymbolTable and NumberOfSymbols are another relic of the COFF format. They are not used in PE and we just fill them with zeros.After the main header comes the so-called "optional header". This name is also a legacy of COFF, as this structure contains a crucial information about the entry point of an executable code and is definitely required for any PE image. It was only optional in COFF, when the file could be an intermediate object, not yet made into an executable.The optional header follows immediately after the main one and is in turn followed by the section table. Thus to obtain the size that we need to put in SizeOfOptionalHeader we just compute the difference between the OptionalHeader and SectionTable addresses.The value of Magic identifies a variant of PE format. For classic 32-bit PE it is always 0x10B (a ZMAGIC value which COFF inherited from the old a.out format); while 0x20B is used to mark PE+ files, a variety intended mainly for 64-bit architectures. They slightly differ in format of the structures that follow, we are going to look at these differences later, when we create a 64-bit executable.Of the other fields in this initial portion of the "optional" header the only important one is AddressOfEntryPoint, which should contain an address of entry point relative to the base of the image. The specification calls this kind of value an RVA (Relative Virtual Address), while VA (Virtual Address) is just a direct address in memory. To compute an RVA we simply subtract IMAGE_BASE from the address (VA). The EntryPoint label is going to be defined later, in the code of our program.MajorLinkerVersion and MinorLinkerVersion are filled by a linker when it creates the executable, this allows the linker to put some mark of authorship on the executable. We are not a linker, so we can decide for ourselves what kind of mark to leave there. A simple choice is just zeros.The other fields, like SizeOfCode and AddressOfCode, are remnants of the original COFF model (which in turn inherited them from the old a.out) and they do not really matter to PE loader. Various kinds of code and data sections may be intermixed within the image and the true authority on their sizes and placement is held by the section table. The fields here are just a supplementary information and, for instance, if there were several sections of data with some code in-between, the sum of their sizes would serve only a statistical role.If we wanted to be pedantic about it, we could fill these fields with values copied from our section table, but for now we just leave them zeroed. An additional sign of the irrelevancy of these numbers is that in PE+ the entire BaseOfData field is readily sacrificed to allow the subsequent ImageBase field to be enlarged to 64-bit without moving the later ones.In contrast, this part of headers holds many important values. All the constants we defined earlier - the base of the image and the alignment values - are stored here exactly as they are. We also use two constants we have not yet defined to fill SizeOfImage and SizeOfHeaders, we are going to calculate these values later.MajorOperatingSystemVersion together with MinorOperatingSystemVersion as well as MajorSubsystemVersion with MinorSubsystemVersion declare what version of operating system is needed to execute the image. Programs created for older versions are allowed to run on the newer ones, and this example is not going to use any features that were not in Windows since the beginning, so to not unnecessarily limit the execution of program we put 3.10 there (this is the version number of first Windows NT that supported PE format).MajorImageVersion and MinorImageVersion could indicate the version of our program, but they are usually unused. And Win32VersionValue is just a reserved field, with currently unknown purpose; it needs to be zero. The same goes for LoaderFlags further below.CheckSum is a value computed over all the bytes of the executable that can be used to check whether the file has been modified in any way since the time when it was calculated. Normal programs are not required to have a valid checksum, so in this example we are going to skip this step. But even when we plan to compute the checksum, the value of this field should not partake in the summation so it is better to have it initially zeroed.Subsystem identifies the environment where the program wants to be run. For normal applications this is either GUI or console.DllCharacteristics is an additional set of flags supplementary to Characteristics in the main header. This is another case of a misnomer, the flags here are not necessarily related to whether the file is a DLL. Nevertheless, at the moment we do not need to set any of them.SizeOfStackReserve and SizeOfStackCommit set up the size of stack for our program, the former states how large the stack is allowed to become, while the latter determines the initial size. We go with a single page for both. SizeOfHeapReserve and SizeOfHeapCommit provide similar settings for the local heap, which is a pool from which program may allocate small blocks of memory whenever needed. We set up some usual values, though we are not going to use heap in our simple program.Finally, NumberOfRvaAndSizes specifies how many pairs consisting of a relative address and a size follow immediately after. This forms a sort of catalogue of specialized data structures present in the image. They come in a fixed order, as folows:Out of many possible tables that PE image may declare this way, we provide just one. The import table is necessary for us to gain access to the functions of Windows API. When we define it below, we need to demark it with ImportTable and ImportTable.End labels.Here the optional header ends, immediately followed by the section table - a crucial component of the headers.Our table contains two records, defining two sections with different attributes. The '.text' is a usual name for a section containing executable code, in other words: the text of the program. The '.rdata' is going to contain all kinds of read-only data we need, this should be enough for the first sample.The name of the section is stored in an 8-byte field, padded with zeros. We use DQ to define this as a 64-bit value and convert the string to a number with the + operator, in order to enable range check. A DQ with a string argument would allow text of any length and it would simply pad it so that the size was a multiple of 8 bytes. By converting text to a number we ensure that it has to fit in a single 64-bit cell so the field is always exactly 8 bytes long.VirtualAddress and VirtualSize define the boundaries of a section within the image in memory. The starting address needs to be set up consistently with the SectionAlignment, we need to keep this in mind later when we define the labels used here.PointerToRawData and SizeOfRawData define the placement of the contents of a section within the file. Both values have to be aligned accordingly to the FileAlignment, so it is possible for section's data in file to be larger than the size of that section in memory. It can also be the other way around, since a section may reserve more memory than it contains actual data. In an extreme case the size in file might be 0 when a section contains nothing but reserved memory. We are going to compute the constants used there with help of the $% symbol, after ensuring the proper alignment within the file.The fields that refer to relocations and line numbers are in these structures because COFF objects use them, but for PE images they should be zeroed. Although PE could contain some relocations, they would be very different from the ones used by COFF and defined elsewhere (we are going to discuss them a bit later, the first example can work without them).Characteristics contain various flags, here we just mark both sections as a readable memory and the code section as executable. These settings translate directly into the attributes of allocated memory pages, so they are quite important. We could also use values like IMAGE_SCN_CNT_CODE and IMAGE_SCN_CNT_INITIALIZED_DATA (connected to the fields like SizeOfCode and SizeOfInitializedData in the main header), but this would mostly be just decorative.The end of the section table is also the end of the contents of the headers. Before we go further, we are going to fill up a few of the related constants. They are a bit redundant, the effect would be the same if we plugged the corresponding expressions directly in the places where we used their names earlier. But the use of middlemen constants helps to comfortably alter the way they are computed when this comes up in the future.To count the number of records in a table we divide the total size by the length of a single entry as defined in the specification. As long as we define the tables correctly, everything should add up.As for the total size of headers, it has to be rounded up to the nearest multiple of FILE_ALIGNMENT, and this is at the same time the position where the contents of the initial section is going to begin. Therefore we can cheat a little and shift the responsibility to another constant, the one defining the offset in file for the first section.However, to correctly position our initial section we need to do some actual work.First, we move in the image to the nearest multiple of SECTION_ALIGNMENT, by adding the right amount of reserved data (this is the default behavior of our ALIGN macro). This allows us to define the label corresponding to the start of the first section in memory.Then we use the SECTION instruction of the assembler to cut off all the reserved bytes so they do not get included in file. In this particular case the only reserved bytes to discard are the ones made by the previous alignment.With the use of $%% as an argument to SECTION we temporarily switch from in-memory addressing to one tracing the actual position in file. This makes the address $ equal to the offset $% until we change this with another SECTION or ORG.After that we use the alignment macro once more, this time to align the offset in file to the nearest multiple of FILE_ALIGNMENT. While the previous alignment just moved our address in memory without adding anything to file, this time we provide the second argument to the macro to make it write the necessary amount of zeroed bytes to the output.Then Section.1.OFFSET_IN_FILE can be defined simply as a label, thanks to the address being the same as the position in file.Finally we switch back to in-memory addressing, at the address of Section.1 label. A simple ORG would suffice, but we use SECTION for the visual appeal:This is the entirety of our executable code, with entry point defined at the start of the section. There are just two types of x86 instructions used in this example, PUSH to store the arguments for the API functions on the stack, and CALL to execute the functions. In the next section we are going to set up the pointers to the functions and the character strings for MessageBoxA.Now we need to perform the full alignment ritual again, this time to set up the position of the second section. We also calculate the size of the first one in file simply by computing the difference between the aligned offsets.With the proper alignments done, we move on to the second section, the one where we are going to put all the data.We start with the import table, which allows us to direct the loader to fill up our pointers with the addresses of the functions from system DLL files. This is actually a complex structure that consist of several smaller tables. First, there is an Import Directory Table.Every record in this main table declares a single DLL file from which we want to import functions. The table ends with a record that has all five fields zeroed.NameRva is a relative address of the name of DLL file. We are going to put these names near the end of the import-related data.ImportLookupTableRva and ImportAddressTableRva point to two parallel tables. The former contains relative addresses of structures declaring functions to be imported, while the latter is going to contain actual addresses of imported functions. The functions can be in any order, as long as the same one is used for both tables. When our image is loaded into memory, the operating system is going to look for all the functions defined by the first table and fill the second one with corresponding addresses.TimeDateStamp and ForwarderChain fields are used when the imports are bound - that is, when the second table is pre-filled with addresses of imported functions to save time when loading the image. This obviously can work correctly only when all the addresses in imported library are exactly as they were upon binding, and TimeDateStamp keeps the value of the timestamp of the DLL to provide a way to verify that it is exactly the same file. If the timestamps match, the loader can skip looking up all the functions, otherwise it does it as usual. Our imports are not bound, we need the loader to fill the addresses for us, therefore we keep TimeDateStamp zeroed in every case.If the imports were bound, ForwarderChain would be interpreted as an index of a function that could not be bound because it was a forwarded import from another DLL. The value of the corresponding entry in the import address table would be an index of another such function, and so on. If we wanted to indicate that there were no such functions, we should put -1 in this field, but since we do not use binding (as indicated by the zeroed TimeDateStamp) this value is irrelevant.Now we need to create lookup tables and address tables for every DLL. The initial contents of the parallel tables should be the same, they both should contain relative addresses to the lookup entries defining the functions. When the image is loaded, the IAT is rewritten with the matching addresses. We can then use these values directly, therefore we label them with names of the functions and this is exactly what is needed to get the CALL instructions in our code to work.We import only one function from each DLL, so the tables are short. The end of a table is marked by a zeroed entry.Next come the lookup definitions for individual functions. Each such structure contains a 16-bit hint followed by the name of the function as a null-terminated string. The hint is an index into the export table of DLL, where the loader may look for the function with such name. If the hint fails, the loader continues to search for the function as usual, thus we do not have to know the right values to put there.Even though this most often does not matter, the 16-bit values should be aligned to their "natural boundary" (that is their address should be a multiple of 2), while the string could end on an uneven address. For this reason we put an ALIGN between the records.Finally, we conclude the import table with the names of DLL files that we import. They are a plain null-terminated strings.Since we are at it, we can define a couple more strings here. The import table has ended, but we can keep placing more data into the '.rdata' section and we still need to define the caption and the content for the message box that this program wants to show.This marks the end of our second section and, in fact, of the entire image. All that is left in another sequence of memory and file alignments.This time there is no next section, so we do not define labels and constants that would refer to it. Instead we define SIZE_OF_IMAGE, which needed to be a multiple of SECTION_ALIGNMENT too.This is it, the source for our first PE image is ready (a copy is in the attached "basic.asm" file). We can now assemble it into a file with the "exe" extension and let it run.We can also combine it with the "listing.inc" script to contemplate the binary data juxtaposed with the commands that generated it. You may notice that numerous lines from "80386.inc" show up in the listing. To get rid of them, we can hide the included file inside a simple macro:While we are at it, we may also incorporate the "ntimage.inc" file that defines some of the constants associated with PE format:This allows easier experimentation with some of the values that earlier we hard-coded:For example, we can add IMAGE_DLLCHARACTERISTICS_NX_COMPAT to DllCharacteristics, allowing to enable DEP (Data Execution Prevention).This can make the IMAGE_SCN_MEM_EXECUTE bit in our section definitions really mean something.It was a first step towards making our source more maintainable. Another one could be to automate some of the tasks. For example, we can generate all the entries in the section table with a simple repetition:REPEAT allows to assemble the same piece of source in multiple copies and in every copy it replaces the name of the counter with the corresponding number. We defined a counter named "n" that starts from 1 and this generates the same labels as we had previously there.This approach requires that we define several more constants. We also have to change how the NUMBER_OF_SECTIONS is defined, we can no longer compute it from the size of the section table, as this would create a circular dependence:We can, however, take it a step further and automate everything by making a macro to define sections:We named the macro the same as the instruction of fasmg - this is allowed, and inside our macro the name still refers to the original instruction.To define labels and constants that correspond to enumerated section entries, we need to extract the number from the CURRENT_SECTION variable and somehow place it into names. The trick in fasmg is to use REPEAT with just a single repetition, solely for the purpose of defining counters that get replaced with numbers before the repeated text is assembled.The macro does everything that we have previously done manually when starting a new section. The ending address and the size in file get defined only when the next section is started, so we need to define an additional false (not counted into the total number) section at the end, together with the definition of the NUMBER_OF_SECTIONS and the SIZE_OF_IMAGE.With help of POSTPONE we can place this next to the definition of macro, for a better organization of source. Whatever is inside such block, gets assembled at the end of text.This macro required us to learn a bit more of the assembler's trickery, but it makes the section definitions much more pleasant to the eye:Before we move on to learn about some other tables defined in the optional header, we may automate this set of definitions as well. Everything between RvaAndSizes and SectionTable labels can be replaced with the following construction:ITERATE assembles a block as many times as there are items on the list, substituting the text of a consecutive item for a name given in the first argument (the items start from the second one). Inside it is tested whether a symbol with a name like ExportTable or ImportTable is defined anywhere and only in such case the fields are filled. At the moment we only have ImportTable present, but as a soon as we add another one of the listed, the optional header is going to contain the right values to make it work.A variant of the first source that has all these improvements is in the attached "basic_template.asm" file. We are going to use it as a base for the continued experiments.