The Lazy Beginning Reverser's Guide to Windows Assembly

by Vortex168 A beginner-level tutorial, which will teach you the basics of ASM and how they work under Windoze. This is almost making me want to crack something again... ;)

The Lazy Beginning Reverser's Guide to Windows Assembly ---------------- | Introduction | ---------------- I'm vortex168 and this is a guide to Windows Assembly for beginning reversers. It's for lazy people because it's a barebones crashcourse in Windows Assembly that covers what you will definitely need to know in order to properly 'modify' a disassembled program (not understand it). And it's for beginners, so for those of you that get bored, you have only yourselves to blame =). You should not have trouble understanding this guide as long as you have a logical brain. The more programming knowledge you have the easier it will be to pick up Windows Assembly. It should be noted that after this guide, it is highly recommended that the reader read some of the other, more complete Windows Assembly guides/books because the more you know, the easier it is to understand a disassembled program. Without further ado, let the guide begin! Before you begin =), try to understand what hexadecimal (hex) is. If you're THAT lazy and don't want to until later, suffice to say, just think of them as numbers, but don't try to add/subtract/multiply/divide them, because your answer will probably be wrong. This type of number is distinguished from your everyday number (decimal) because it will have an 'h' at the end of it (eg. 1230h). *Extra Note: Don't read the Extra Notes if you don't feel like it. They aren't essential... not really =). Semicolons are the symbol that tells the assembler (the thing that puts together a program written in Assembly) not to bother with anything past it. So, on the same line, everything after the ';' is ignored and is known by programmers as a 'comment'. --------------------------------------- | Part 1: Registers, Flags, The Stack | --------------------------------------- --REGISTERS-- Think of registers as a fancy term for 32-bit (it's just a size, calm down if you don't understand, it doesn't matter all that much at this point) variables. You use them exactly as you would use any other variables, to store values that you will need later on. For those who aren't quite as lazy as some others, here is a quick run-through of the registers that you will see while reversing a Windows32 program (as opposed to a DOS one): (This can be skipped if you are that short on time) General Purpose Registers: EAX - Commonly used in mathematical operations EBX - Commonly used as a pointer (if you don't know what this is, don't worry, it doesn't matter that much right now) ECX - Commonly used as a looping variable (eg. it stores the value '5' if a loop needs to run 5 times) EDX - Similar to EBX Registers that you should NOT touch if you don't know what you're doing: CS - 'Code segment'. Basically tells you where you are in memory (think of it as part of the address that tells you where in your computer the program is stored) DS - 'Data segment'. Same as above, but tells you where the data is stored (eg. strings) ES - 'Extra segment'. Ask someone else what the hell this does, I've never messed with it =). SS - 'Stack segment'. (See Above) ESI - 'Source Index'. (See Above) EDI - 'Destination Index'. (See Above) EBP - 'Base Pointer'. (See Above) ESP - 'Stack Pointer'. (See Above) EIP - 'Instruction Pointer'. I know this one =). It holds the address of the next instruction. By now you should realize that I don't know everything about Assembly. Why? Because I understand your laziness much better than you might think =). *Extra Note: The 'E' in front of some of the register names just indicates it is a 32-bit version of the register to distinguish it from their 16-bit counterparts that were used in DOS and other 16-bit processors. Each of the General Purpose Registers can be broken down into parts. For example, 'EAX's lower half is 'AX'. 'AX' can be divided into 'AH' and 'AL'. AH is the 'high' half of AX, and AL is the 'low' half of AL. --FLAGS-- Doesn't deserve a whole Part since I'm not saying much about it because you don't mess with these much anyway. The only thing I want to point out is that in Debuggers, aside from the program code and values of the abovementioned registers, there is also a set of single letters that are (depending on what you use, it'll be represented differently) either highlighted or not highlighted. These are 'flags' that basically store TRUE or FALSE. The most important one to know is the Zero flag. It's obviously the 'Z'. It's use will be seen when jumps are explained later. --THE STACK-- The stack is basically your computer's warehouse. It can be used to store anything. Why do people bother with registers? Because they're a lot faster and easier to deal with. In order to properly introduce the stack, I will need to tell you about your first two assembly instructions. Push and Pop. Learn to love them, or at least tolerate them because you'll be seeing a LOT of them. The stack, as it's name implies, operates like a stack of plates. Information gets Push'ed on the stack of plates and if you want to get information off of the stack, you Pop it off the top. As you have probably realized, it is a FILO (first in last out) scheme, meaning that information that gets pushed onto the stack first will be the last to get popped off. (if you get the point, skip the example) Example: push 1h ; put the value '1' on the stack push 2h ; put the value '2' on the stack push 3h ; put the value '3' on the stack pop eax ; take off the top value (3) and put it into ; eax pop ebx ; take off the top value (now it is 2, ; because 3 got taken already) and put it ; into ebx pop ecx ; take off teh top value (now 1) and put it ; into ecx ---------------------------------------- | Part II: Basic Assembly Instructions | ---------------------------------------- This is a list of the most common Assembly instructions that a reverser at your current level (which should be far below NEWB) will need to understand. If you deal with tougher programs, they might use some weird functions, which is why you should grab yourself a handy reference guide when you get around to it. Find it yourself you lazy bum. We have already discussed two instructions, Push and Pop, so we begin from there: MOV - um...it moves stuff around =). Example: mov eax, 203h This line moves the value 203 hex into the EAX register. Mathematically it looks like this: X = 203h where X is just the EAX register. If you still don't understand, go back to school. CALL - it calls the function located at the given address. Example: call 03828549 ; This is what it will look like in ; a disassembly. ; When actually programming in ; Assembly, tags can be used instead ; (eg. call my_function). This line calls the function at the address 03828549. *Extra note: If this line of code were located at address 02955555, this address would be stored on top of the stack to let the program know where to go back to. JMP - unconditional jump, it jumps the program to whatever address it is told to, skipping everything between. Example: jmp 00300000 ; assume this line is located at ; 00200000 push eax ; thus this line is at 002XXXXX (yes ; i could tell you where exactly, ... ; but you don't need to know). It is ; not executed because of the jump ; above. push ebp ; If this instruction is at address ; 00300000, it gets executed after ; the jump. CMP - compare two values. Essentially subtracts the 2nd from the first and throws away the result. BUT! It sets a TON of flags that are almost always used in conjunction with a conditional jump of some type. If the two values are equal, then the subtraction will equal zero and the Zero flag will be set (will be TRUE). The carry, overflow, sign, and a couple other flags may get set/unset as a result of the cmp function as well, so many different types of conditional jumps can be used after a compare. JNZ/JZ - conditional jumps, two of many many possible types (eg. JG (greater than), JL (less than), JGE, JNG, etc.) these two are more common, Jump if not Zero and Jump if Zero. These check the zero flag which was mentioned earlier. If the zero flag is set, then the JZ jump gets taken and the JNZ does not. Very important for any reverser/cracker because it dictates the flow of the program and often comes after a cmp of two values (say, maybe the serial number you enter and the one that it's supposed to be? =P). If the jump is not taken, the program simply continues on with the next instruction after the jump. RET - return, tells the program that it has completed a routine (function) and needs to return to where it came from. *Extra Note: For more complete information about the above topics, search around for Cruehead's Assembly tutorials on the internet. They're good...they're just a lot longer than mine =). --------------------------------------- | Part III: The Windows API Functions | --------------------------------------- Every Windows program (um, I think) makes use of the Windows API functions at some level. These are a bunch of functions that do all the useful stuff you would need to do in a Windows program (ie. get a string, make a message box, etc.). In a disassembly of a program, you will see many calls to API functions. As for any call, the parameters (fancy word for 'inputs') for the function are first push'ed on to the stack. So it would look something like this: push input2 push input1 call FunctionThatTakes2Inputs Notice that the 2nd input is pushed and THEN the first input is pushed. This is because things pushed onto the stack get put on top of one another and we want the top item on the stack to be the first input when we call the function. So while trying to understand a portion of code, pay attention to why things are being push'ed on to the stack because chances are you will need to know what values are getting thrown around by the program (say, perhaps if a function takes two inputs, one being your name and the other your company, what do you think will happen?). I'd highly suggest you get yourself an API reference so that you can look up any unfamiliar API functions. It will help greatly when trying to actually understand a program and you forget what the 2nd input of the MessageBoxA function is supposed to be. The following is a list of some of the more cracker-oriented API functions (not reverser-oriented because a reverser needs to know them all if he/she wants to understand a program completely, which is the difference between the two). GetWindowText: ~~~~~~~~~~~~~ int GetWindowText( HWND hWnd, // handle of window or control with text LPTSTR lpString, // address of buffer for text int nMaxCount // maximum number of characters to copy ); Parameters: hWnd - Identifies the window or control containing the text. lpString - Points to the buffer that will receive the text. nMaxCount - Specifies the maximum number of characters to copy to the buffer. If the text exceeds this limit, it is truncated. *Extra Note: For those unfamiliar with the C programming language, the above bit is a function definition for the GetWindowText function. The 'int' right before the GetWindowText indicates that the function will return an integer value after it is completed. This translates into plain english as 'GetWindowText hands whoever called it some number'. The three lines between the parenthases are the inputs that the function will take. Each line is one type of input and the HWND is the TYPE of input while the hWnd is just the name that the function has decided to use for the input. The same goes for the rest of it. This function is used to, duh, get text from a window. The 'handle' of a window is kind of like the name that Windows uses to recognize the window ('window' is, in this case, equal to 'program'). In your disassemblies, this will be some address that is assigned to the program at the very beginning of the program. When simply cracking, you don't really need to worry about this, just know that it is the last parameter that will be pushed right before the call to this function. The most important of these parameters is the second one. This is the address that the retrieved string will be stored in. Here is an example: push 00000016 ; max length of 16h characters (22 characters) push 00493938 ; retrieved string will be stored at address 00493938 push 00482638 ; handle of window call GetWindowText So now you know that the string it is retrieving will be at address 00493938 and that any future reference (unless it gets replaced with something else) to this address will be referring to the text that was retrieved by the function. GetDlgItemText: ~~~~~~~~~~~~~~ UINT GetDlgItemText( HWND hDlg, // handle of dialog box int nIDDlgItem, // identifier of control LPTSTR lpString, // address of buffer for text int nMaxCount // maximum size of string ); Parameters: hDlg - Identifies the dialog box that contains the control. nIDDlgItem - Specifies the identifier of the control whose title or text is to be retrieved. lpString - Points to the buffer to receive the title or text. nMaxCount - Specifies the maximum length, in characters, of the string to be copied to the buffer pointed to by lpString. If the length of the string exceeds the limit, the string is truncated. This function is used to do the same as above, but under different circumstances. Go find out the difference between the two yourself if you are that hardworking, it has to do with the difference between a 'window' and a 'dialog'. MessageBox: ~~~~~~~~~~~ int MessageBox( HWND hWnd, // handle of owner window LPCTSTR lpText, // address of text in message box LPCTSTR lpCaption, // address of title of message box UINT uType // style of message box ); Parameters: hWnd - Identifies the owner window of the message box to be created. If this parameter is NULL, the message box has no owner window. lpText - Points to a null-terminated string containing the message to be displayed. lpCaption - Points to a null-terminated string used for the dialog box title. If this parameter is NULL, the default title Error is used. uType - Specifies a set of bit flags that determine the contents and behavior of the dialog box. This parameter can be a combination of flags from the following groups of flags. I have cut off the 'groups of flags' because it was a long list. This function creates a message box. You see these all the time and they are often used to make the annoying 'you have 30 days left in your trial' messages that pop up for shareware. The last parameter (uType) basically tells what buttons are in the message box. So sometimes you see a message box with Abort, Retry, and Ignore. Other times you see a box with Ok and Cancel. etc. etc. That's what this parameter does. If you look up the MessageBox function in a Win32 Programmer's reference, it will give you a list of the valid values. -------------- | Conclusion | -------------- Well, that's all I have. If you have any complaints, keep them to yourself. If you have any useful suggestions about what I should add, parts that are unclear (please be as specific as you can about this), things that you see as unneeded, any errors you might find (don't bother telling me about spelling or grammar errors though), etc. email me at vortex168 at asia dot com. Or post something in the Feedback Forum at Mala's if you're shy. Maybe I'll write something that makes use of what this tutorial teaches. Maybe. =). July 2002 vortex168