The basics of programming in assembly, the design of the processor, registers, memory, instruction, and use of assembly language within C++ and Delphi.

1. Introduction to assembly Assembly language, a low-level programming language which allows you to use all the features of a computer processor is nowadays somewhat forgotten by “modern” developers. The main reason for this is that writing in assembly is not the simplest of tasks, and is very time-consuming (testing code, finding bugs etc.). However, in some situations assembly may be an ideal solution. An example is any kind of algorithm where speed is essential, such as in cryptographic (i.e. encryption) algorithms. Despite incredible advancements in compilers in recent years, algorithms such as Blowfish, Rijndael, Idea written in assembly and “manually” optimised show significant speed advantages over their counterparts written e.g. in C++ and compiled at the maximum optimisation level. In addition to cryptography, assembly is also often used by game developers. The best example may be the game QUAKE 2. After the publication of its source code, it turned out that all the algorithms that require speed were written in assembly. So let's get started. To be clear, I should add that in this article I will focus on assembly for x86 processors, and its use in a Windows environment.

2. Fundamentals of assembly If you have never written in assembly, before you can even create the simplest program, you must first learn several fundamentals like the CPU registers, instructions, and the stack. From the programmer's perspective, a standard processor (I will use the Intel Pentium MMX as an example, as it is all I've got :-) has a large range of instructions ranging from 8 to 16 to 32-bit x86 instructions, as well as floating point and MMX instructions.

2.1. CPU registers The processor has eight 32-bit general purpose registers and flags register, as well as eight 80-bit coprocessor registers ( st0 - st7 ) and an equal number of 64-bit MMX registers ( mm0 - mm7 ). The processor also has several control registers, that we generally don't use. What is a register? A register is like a memory cell, which can temporarily store data; we can exchange data between the registers, and perform logical operations and arithmetic on the registers. The Pentium processor is 32-bit, which means that each of the general purpose registers is 32 bits wide (corresponding to unsigned int in C). All 32-bit registers have a 16-bit half (a remnant from the 286 processor), while the 16-bit halves of registers EAX , EBX , ECX and EDX are each divided into two 8-bit halves: Register Name 16-bit half 8-bit halves Description EAX AX AH and AL Accumulator EBX BX BH and BL Base ECX CX CH and CL Counter for string operations and loops EDX DX DH and DL Data ESI SI n/a Source register for string instructions EDI DI n/a Destination register for string instructions EBP BP n/a Pointer to data within the stack, used by functions to locate parameters saved on the stack ESP SP n/a Stack pointer

2.2. General purpose registers When writing a program, or inline assembly code under Windows, you can use all the general purpose registers, but using the special registers ESP and EBP can interfere with the operation of the program. For example, if you reset the ESP register to zero within a function, the program will most likely crash later (e.g. if the program tries to return from the function).

2.3. The stack The stack is an area of memory reserved for the needs of the program. These include passing parameters to functions (as 32-bit values), temporary data storage, and all local variables. When the program starts, the ESP register (stack pointer) points to the end of the stack. When data is stored on the stack, the ESP register is decremented, and the data is then stored in the memory location which ESP points to. To store data on the stack, the push instruction is used, for instance: __asm { push 5 // store the number 5 (32 bit) on the stack push eax // save the contents of register EAX on the stack push dword ptr[edx] // save the contents of memory referenced by // the EDX register sub esp,4 // equivalent to 'push 5' mov dword ptr[esp],5 sub esp,4 // equivalent to 'push eax' mov dword ptr[esp],eax } To retrieve and remove a value from the stack, the pop instruction is used, which works in the opposite way to push . First the value is read from the address indicated by the ESP register, then the ESP register is incremented: __asm { push 5 // store 4 32-bit values on the stack push eax push dword ptr[edx] push 13B0C032h pop eax // remove the most recent value from the stack, // which in this case is the number 13B0C032h pop dword ptr[edx] // this operation does not change anything, since // the value stored on the stack came from the // location referenced by EDX and is simply being // returned there pop edx // put the value originally held by EAX into EDX pop ecx // put the value 5 into register ECX push 5 // store the value 5 on the stack // the following instructions simulate 'pop eax' mov eax,dword ptr[esp] add esp,4 }

2.4. Limitations in Windows If you have written assembly programs under MS/DOS, where there were no limitations, you will need to be aware that there are some differences under Windows. As I said earlier, in assembly we can use all the instructions that the CPU supports, however some instructions are not permitted by the operating system, in our case Windows. For instance, if we use I/O port instructions, the compiler will not give an error, but the program will most likely crash if these instructions are executed under Windows. Instructions which can cause the program to be terminated include the above-mentioned I/O port instructions, as well as instructions that refer to interrupts, segment registers and control registers. Regarding the segment registers, Windows uses the flat memory model, which means that all code and data exists in the same memory space ranging from 0 up to 0xFFFFFFFF . So, when accessing memory there is no need to bother with segment registers. Unlike in MS-DOS, there is no need to use segment prefixes like DS: .

3. Using assembly language To take advantage of the benefits of assembly, you must first check whether your development tools allow its use. Products such as Borland Delphi, Builder, Watcom C++ or Microsoft Visual C++ allow you to use (compile) assembly code; Visual Basic is the only popular RAD package which does not allow writing code in assembly. These products support the use of assembly code in two ways. The first is called inline assembly, where the assembly code is inserted into the regular code written in e.g. C++. The second method is linking modules (i.e. separate files) written in assembly with modules written e.g. in Delphi or C++.

3.1. Inline assembly Before you start writing assembly code, you must check how to write it, because there are two types of syntax for assembly code. The first type is called “intel syntax”, and is used in products such as Delphi, Builder, MSVC, Borland TASM, Microsoft MASM (assembly compilers). This syntax is now the standard and is used in 90% of sources. The second type is called “at&t syntax”, and is used e.g. in C compilers, such as GCC (Linux platform), DJGPP and LCC. Inline assembly is the easiest way to write asm code. When writing assembly code in Delphi or Builder, it must be enclosed between the asm keyword marking the beginning of the assembly code, and the end; keyword after the code. For example: // our first 'hello world' in assembly, Delphi version asm // start of assembly code mov eax,1 // move the value 0x00000001 into register EAX // the C++ equivalent of this instruction is the // assignment operator '=', e.g. // x = 1; // the Delphi equivalent is the assignment // operator ':=', e.g. // y := 1; mov ecx,eax // move the contents of register EAX into // register ECX, that is, the value 0x00000001 // will end up in ECX shl ecx,2 // this 'Shift Left' instruction will shift the // contents of register ECX to the left by 2 bits // As you may know, left shifting serves to // multiply values by successive powers of 2 // Shifting 0x00000001 to the left by two bits // will result in the value 0x00000001 * 4 = 0x00000004 // saved to ECX // in C++, bit shifts are achieved with the '<<' // operator, e.g. // x = y << 2; // in Delphi, bit shifts use the same keywords as // as assembly code, namely 'shl' or 'shr', e.g. // x := y shl 2; shr eax,1 // this 'Shift Right' instruction will shift the // EAX register to the right by 1 bit and eax,0 // 'And' is a logical multiplication of bits // according to the following table: // 0 * 0 = 0 // 1 * 0 = 0 // 0 * 1 = 0 // 1 * 1 = 1 // Any value multiplied by 0 will give 0; in this // case, the EAX register will be zeroed out // The C++ equivalent of this instruction is // the '&' operator, e.g. // x = y & 0; // in Delphi: // x = y and 0; or eax,0FFFFFFFFh // 'Or' is a logical sum of bits according // to the following table: // 0 + 0 = 0 // 1 + 0 = 1 // 0 + 1 = 1 // 1 + 1 = 1 // in this case EAX will be ORed with the value // 0xFFFFFFFF, which will result in the value // 0xFFFFFFFF no matter what EAX contains // The C++ equivalent of this operation is the // '|' operator, e.g. // x = y | 0xFFFFFFFF; // in Delphi: // x := y or $FFFFFFFF; sub edx,edx // 'Subtract' subtracts the value of one register // from another. In this case, EDX will become zero // The C++ equivalent is '-', e.g. // x = x - x; xor eax,eax // 'eXclusive Or' follows this table: // 0 ^ 0 = 0 // 1 ^ 0 = 1 // 0 ^ 1 = 1 // 1 ^ 1 = 0 // This function yields 1 when its two inputs are // different; if they are the same it will give 0 // Hence the instruction 'xor eax,eax' will zero // out the EAX register // The C++ equivalent is the '^' operator, e.g. // x = x ^ y // in Delphi: // x := x xor y; end; // end of assembly code Writing inline assembly in MSVC only really differs in how the assembly code is introduced to the compiler: // our second 'hello world' in assembly __asm { // start of assembly code push 5 // save the value 0x00000005 on the stack pop eax // remove 0x00000005 from the stack and write // it to register EAX push eax // save the contents of register EAX on the stack // (in this case the value 5) pop edx // remove the value 5 from the stack and write it // to register EDX mov ax,0FFFFh // write the value 0FFFFh to the 16-bit lower // half of register EAX mov dx,ax // write the value from register AX to the 16-bit // lower half of register EDX mov al,11 // write the value 11 (decimal) to the 8-bit // lower half of register AX mov ah,11h // write the value 11 (hex) to the 8-bit upper // half of register AX, which is 17 in decimal } // end of assembly code

3.2. Using variables in assembly Writing in assembly, you have access to all global variables, and if the code is in a procedure, it also has access to the local variables and parameters of the procedure/function, so its capabilities are practically the same as normal code. An example of the use of global and local variables: // global variables var ByteVar: Byte; // byte - 8 bits WordVar: Word; // word - 16 bits IntVar: Integer; // double-word - 32 bits ... procedure noop; // local variables of function 'noop' var LocalByte: Byte; LocalWord: Word; LocalInt: Integer; begin // initialise global variables ByteVar := $FF; // 8-bit value WordVar := $FFFF; // 16-bit value IntVar := $FFFFFFFF; // 32-bit value asm mov al,ByteVar // write an 8-bit value to an 8-bit register mov LocalByte,al // write an 8-bit value to a local variable mov ax,WordVar // 16-bit value to 16-bit register mov LocalWord,ax mov eax,IntVar // 32-bit value to 32-bit register mov LocalInt,eax end; end; The example for MSVC is not much different from that of Delphi: // global variables char ByteVar; short WordVar; int IntVar; ... void noop() { // local variables char LocalByte; short LocalWord; int LocalInt; // initialise global variables ByteVar = 0xFF; // 8-bit value WordVar = 0xFFFF; // 16-bit value IntVar = 0xFFFFFFFF; // 32-bit value __asm { mov al,ByteVar // write an 8-bit value to an 8-bit register mov LocalByte,al // write an 8-bit value to a local variable mov ax,WordVar // 16-bit value to 16-bit register mov LocalWord,ax mov eax,IntVar // 32-bit value to 32-bit register mov LocalInt,eax } } You can write entire functions in assembly language. When doing this, there are a few things to keep in mind. If the function returns a value, we must ensure that the returned value is stored in the EAX register before leaving the function. A simple example: // Delphi version function add(x, y:integer):integer; asm mov edx,x // copy the function's first parameter to EDX mov ecx,y // copy the function's second parameter to ECX add edx,ecx // add x and y together mov eax,edx // write the result to register EAX // this becomes the function's return value end; // C++ version int mult(int x,int y) { __asm { mov edx,x // copy the function's first parameter to EDX mov ecx,y // copy the function's second parameter to ECX imul edx,ecx // multiply x by y mov eax,edx // write the result to register EAX // this becomes the function's return value } } We already know that functions written in assembly must place the return value in the EAX register, but what about the other registers? In short, registers EAX , EDX , and ECX may contain any value when the function exits, but registers EDI , ESI , EBX , and EBP generally must not change (their value must be the same as it was before the call). You may wonder why this is the case. Well, the code produced by the compilers of the HLL (high-level language) use this second group of registers throughout the program to hold e.g. addresses of functions, constants, etc., and if they are changed by a function, code that runs later may use invalid values, which can cause anything from data corruption to a crash. It is easy to prevent such errors: // Delphi version function count(w,x,y,z:integer):integer; asm push edi // save the contents of registers EDI, ESI and EBX push esi // on the stack push ebx mov edi,w // copy each function parameter to a register mov esi,x mov edx,y mov ebx,z add edi,esi // w + x add edx,ebx // y + z imul edi,edx // (w+x) * (y+z) xchg eax,edi // 'eXCHanGe' swaps the contents of two registers // in this case EAX and EDI, in other words, // the old value of EAX is now in EDI, and the // old value of EDI is now in EAX, which becomes // the function's return value pop ebx // Remove the saved values of the registers from pop esi // the stack, and put them back in the registers pop edi // We must remove the values in reverse order - // looking at the code we can see that it is // 'symmetrical'. If the values were saved in the // order EDI, ESI, EBX, then they must be removed // in the order EBX, ESI, EDI end; In addition to the registers EDI , ESI , EBX , and EBP , the status flag DF (Direction Flag) is expected to be zero (cleared) before and after any call. Just use the CLD instruction if its status is changed within the function. When writing code in assembly that uses the stack, special attention should be paid to ensuring that the stack pointer ESP is always restored. E.g. if the procedure or function stores something on the stack, then this item must be removed before exiting the function. This time we'll look at an example in MSVC: // example of an encryption function void crypt(unsigned char *string) { __asm { push edx // save the contents of register EDX on the stack mov edx,string // grab the parameter from the stack; in this case // a pointer to the string we must encrypt cmp edx,0 // check whether the parameter is valid je _exit_encrypt // if invalid, exit the function _encrypt_loop: mov al,byte ptr[edx] // load the next byte of the string cmp al,0 // check for the end of the string // strings are represented as ASCII; byte 00h // means end-of-string je _exit_encrypt // once we reach the end of the string, exit xor al,7 // encrypt the byte with a simple xor mov byte ptr[edx],al // store the encrypted byte in the string inc edx // set the string pointer to point to the next byte jmp _encrypt_loop // go to the start of the loop so that the // process repeats _exit_encrypt: pop edx // IMPORTANT: correct the stack, and restore the // register EDX to its original value } }

3.3. Calling functions from assembly Sometimes in assembly code you will need to call a function written in another language. How is this done? Very simply, a function is called with the instruction call func_name . It is worth noting that there are several ways to call and “clean up” after a function: Name in C code Parameters Return values Modified registers Info cdecl cdecl passed on the stack; the parameters are not removed by the function eax, 8 bytes: eax:edx eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7 This is the method of calling C library functions, introduced by Microsoft. All system functions on the Linux platform also use this convention fastcall __fastcall ecx, edx, any remaining parameters are passed on the stack eax, 8 bytes: eax:edx eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7 Microsoft introduced this standard, but later switched to the cdecl convention in its products watcom __declspec (wcall) eax, ebx, ecx, edx eax, 8 bytes: eax:edx eax This function calling convention was introduced by Watcom in their C++ compiler stdcall __stdcall passed on the stack; parameters are removed by the function eax, 8 bytes: eax:edx eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7 The default calling convention for Windows API functions in DLLs register n/a eax, edx, ecx, any remaining parameters are passed on the stack eax eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7 This is the calling convention used in Borland's Delphi The correct calling convention for functions in our own programs (as opposed to WinApi) often depends on the options with which the program was compiled. In Delphi the default convention is “register”, while for most programs written in C, the default is “cdecl”. WinApi functions (Windows system functions) use the mechanism stdcall, where function parameters are first stored on the stack, and then the function is called. After the function returns, there is no need to adjust the stack (remove the previously saved parameters), since the called function does it for us. Interestingly, a few WinApi functions do not use the stdcall convention, but instead use cdecl, that is, the parameters are stored on the stack, then the function is called, but afterwards the stack must be cleaned up manually. An example of such a function is the wsprintfA function from the Windows system library user32.dll (whose counterpart in the C standard library is sprintf ). The cdecl was probably chosen because these functions do not have a fixed number of parameters: // global string unsigned char title[] = "The values of x and y"; ... // this function changes the values x and y into ASCII form, after which // a message box is displayed showing x and y in their string form unsigned int int2str(unsigned char *buffer, unsigned int x, unsigned int y) { // local string, accessible only by the function int2str unsigned char format[] = "x = %lu

y = 0x%X

"; __asm { // Note the way in which the parameters of the function are passed. // In C++, the function call would look like this: // wsprintf(buffer, "x = %lu

y = 0x%X

", x, y); // In assembly the parameters are pushed onto the stack in reverse // order, after which the function is called. push y // save y on the stack push x // save x on the stack lea eax,format // load the address of the local string into EAX push eax // save the address of this string on the stack push buffer // save the pointer to the output buffer, where // the formatted text will end up call wsprintfA // call this WinApi function add esp,4*4 // clean up the stack - 4*4 = 16 bytes. This is // how much space was taken by the parameters // saved on the stack before the function was called // When writing code e.g. in C++, the compiler // takes care of this for you, but in assembly you // must do this yourself push MB_ICONINFORMATION // specifies the icon that will appear // next to the text in the message box push offset title // the window title (a global variable); we use // the keyword 'offset' because we want to write // the address of the string to the stack push buffer // the text which will appear in the message box push 0 // handle of the parent window call MessageBoxA // show the message box } }

4. MMX instructions MMX is the name of an extension to the Pentium series of processors, introduced by Intel. The name is said to be an abbreviation of “MultiMedia eXtensions”, but Intel denies this, and has never explained the issue. The MMX extension to the Pentium line of processors includes a set of new instructions (57, to be exact), and 8 additional 64-bit registers. MMX registers are shared with the FPU registers. This means that you cannot mix FPU (Floating Point Unit) instructions with MMX unit instructions otherwise the contents of the registers will be corrupted. MMX instructions can operate on data in SIMD fashion (Single Instruction Multiple Data). This means that one operation can be performed simultaneously on many data items, which is not possible using standard x86 instruction. MMX instructions are ideal for processing multimedia data, e.g. video, graphics, sound. For example, programs such as DivX or Winamp make intensive use of MMX code. Currently, most processors produced by Intel, AMD and Cyrix possess MMX support. Although MMX has for quite a few years been practically standard, HLL compilers generally do not generate MMX code (except specialised compilers like VectorC). It seems that the natural solution is to program MMX in assembly. Writing procedures using MMX can sometimes get a 100% speed increase compared to the original code. This is possible because of the aforementioned SIMD mode. Imagine a situation where we have two tables of 8 bytes, and we want to add corresponding bytes from both tables to each other. In C++ we would do it this way: unsigned char table1[] = { 0x0A,0x1A,0x2A,0x3A,0x4A,0x5A,0x6A,0x7A }; unsigned char table2[] = { 0xA7,0xA6,0xA5,0xA4,0xA3,0xA2,0xA1,0xA0 }; ... for (int i = 0; i < 8; i++) { table1[i] += table2[i]; } There's no problem with this, but the operation of adding bytes will be repeated 8 times. Let's look at how this can be done much more efficiently by using MMX: __asm { movq mm0,qword ptr[table1] // load 8 bytes from the first table // into register MM0 movq mm1,qword ptr[table2] // 8 bytes from the second table into MM1 paddb mm0,mm1 // add the bytes from MM1 to MM0 movq qword ptr[table1],mm0 // write the result back to table1 } In total, just one instruction is executed instead of 8 additions. Neat, isn't it? And more importantly, efficient. Here a few examples of graphical functions: #define IMG_WIDTH 640 #define IMG_HEIGHT 320 ... // // this function initialises the MMX unit // it should be called: // - before using the MMX unit for the first time // - after using MMX when we intend to make use of the FPU // - after using the FPU when we intend to make use of MMX // void InitMMX() { __asm emms; // Empty MultiMedia State; } // initialises the MMX unit // // a fadeout effect of the screen (fullscreen) // void fadeout(DWORD *lpScreen,DWORD iRounds) { __asm { mov edx,iRounds // load the total number of repetitions mov eax,03030303h // mask for each component of a pixel; // reducing the value of each RGB // component gives the impression of a // fading image movd mm0,eax // transfer the mask to the lower half // of register MM0 punpckldq mm0,mm0 // copy the mask to the upper half of MM0 // such that its full value becomes // 0x0303030303030303 // (recall that MM0 is a 64-bit register) pxor mm1,mm1 // zero out register MM1 _fadeout_max: paddb mm1,mm0 // multiply the mask, which will be // subtracted from the components of dec edx // pixels by the number of rounds jne _fadeout_max // mov eax,lpScreen // load the pointer to the image buffer // into register EAX // the number of pixels divided by 2 // we divide by 2 because by using MMX we // can process 2 pixels simultaneously // (MM1 is an 8-byte register, but each // pixel is only 4 bytes) mov ecx,(IMG_WIDTH*IMG_HEIGHT) / 2 _clear_screen_2_mmx: // load 2 pixels from the image buffer // into MM0 movq mm0,qword ptr[eax] psubusb mm0,mm1 // subtract our mask from all components // (bytes) of those 2 pixels // Both the mask and the pixels are // treated as tables of 8 separate bytes // SIMD-style // write the 2 modified pixels back to // the image buffer movq qword ptr[eax],mm0 add eax,8 // update the pointer to the image buffer, // ready for the next 2 pixels dec ecx // reduce the loop counter (the loop will // repeat for the number of pixels / 2) jne _clear_screen_2_mmx } } // // image negative effect // void negative(DWORD *lpScreen) { __asm { mov eax,lpScreen // load the pointer to the image buffer // into EAX // write the pixel count / 4 into ECX, // since we will process 4 pixels at once mov ecx,(IMG_WIDTH*IMG_HEIGHT) / 4 pcmpeqb mm7,mm7 // set register MM7 to 0xFFFFFFFFFFFFFFFF _neg_mmx: // load 2 pixels from the image to MM0 movq mm0,qword ptr[eax] pxor mm0,mm7 // XOR-ing with all 1s works like the // logical 'NOT' function movq qword ptr[eax],mm0 // repeat with the next 2 pixels movq mm0,qword ptr[eax+8] pxor mm0,mm7 movq qword ptr[eax+8],mm0 add eax,16 // update the pointer to the image dec ecx // and the loop counter jne _neg_mmx } } // // image blur effect // void blur(DWORD *lpScreen) { __asm { push esi // save registers ESI and EDI push edi mov esi,lpScreen // load the pointer to the image buffer // into ESI mov ecx,( (IMG_WIDTH*IMG_HEIGHT) - (IMG_WIDTH*8) + 4 ) mov eax,IMG_WIDTH*4 // the width of a line in the image mov edx,IMG_WIDTH*8 // the width of two lines lea esi,[esi+eax+4] // set the pointer to the first pixel // of the second line of the image pxor mm7,mm7 // zero out MM7 movd mm0,[esi-4] // read pixel to the left into MM0 _blur_more: movd mm1,[esi+4] // read pixel to the right into MM0 mov edx,esi sub edx,eax movd mm2,[edx] // read pixel above into MM2 movd mm3,[esi+eax] // read pixel below into MM3 punpcklbw mm0,mm7 // unpack the components of 4 successive punpcklbw mm1,mm7 // pixels into WORDs punpcklbw mm2,mm7 punpcklbw mm3,mm7 paddusw mm0,mm1 // add the components of the 4 pixels paddusw mm0,mm2 paddusw mm0,mm3 psrlw mm0,2 // divide this sum by 4, in this way // we find the 'average' of the 4 pixels packuswb mm0,mm7 // pack the components (each of which is // a WORD) back into a single DWORD movd [esi],mm0 // write the pixel to the image buffer add esi,4 dec ecx jne _blur_more pop edi pop esi } }

5. When to use assembly As I mentioned at the beginning of the article, assembly is used mainly where speed is important. When writing an algorithm, we should sometimes stop and ask ourselves whether our program could be enhanced, if at some critical points (for instance in loops, etc.), we were to employ, say, MMX. Imagine that you just wrote an mp3 encoder, and a competitor did the same, but you used hand-written MMX code which is three times faster than the competition. Which product will users choose, when they can complete a task in 10 minutes instead of 30? The answer is obvious. Besides being ideal for writing algorithms that require speed, assembly is also used to write particular programs such as EXE-compressors. I'll bet that most people will think of programs like UPX or Aspack, which are used to compress executables. Put simply, if you write a program which occupies let's say 700 kB, when compressed by UPX its size will decrease to approx. 300 kB, but the program will still be in the form of an EXE file, and will be just as functional as before compression. This is achieved by using assembly to write a loader for the code. This is a fragment of code that is stored in the EXE file (almost like a virus), and when you start such a program, the loader decompresses the remainder of the EXE file and allows it to run. Writing a loader in a HLL, whether it be C++, Delphi or even Power Basic is virtually impossible. It can be said that assembly programming is only useful for speed and unusual applications, but this is not entirely true. Writing in assembly language can be more than just inline routines and a few procedures here and there. Entire programs can be written in assembly language! Sometimes I hear people say that it is impossible; that you can't write large applications in assembly from scratch. Often these are people who have only dabbled in assembly for a few hours. If you are a competent programmer, there is nothing stopping you from building professional applications in assembly language. Writing programs in assembly gives us full control over them. Everything is up to us, the program is executed according to our will, and we are not at the mercy of the compiler. These days, writing in assembly is reasonably simple and convenient. A lot of people around the world are beginning to see the magic of this language. People are creating many projects; you can find a whole bunch of sample tutorials and source code, thanks to which many challenges have ceased to be problems. Writing entire applications in assembly also has the advantage that a project with 5MB of source code will be compiled to an executable of approximately 90kB. Compare an application written in Delphi 6, containing 1 window, which takes approx. 300kb compiled, to a program written in assembly language which does exactly the same thing, and works on every Windows release from 95 to XP, with just a 4kb executable. Why the big difference? It's simple: the compiler adds a lot of unnecessary things, “just in case”. Why isn't this made more efficient? We should ask the companies who make compilers. Despite the fact that assembly can be used for many useful things, it is also used to write malicious programs, such as viruses, ransomware, or exploits, but in the words of Winnie the Pooh, that is a story for another day...

6. Summary These examples represent only a small range of what is possible with assembly. There is a lot to discover, just as much for me as there is for you, because contrary to what they say, assembly is not dead, it is constantly changing, evolving, giving us possibilities which do not exist in any high-level language. The terms we hear in the press: SSE, SSE2, 3DNow, are not fiction. Everything is out there. We just have to reach for it. For my part, writing assembly language gives me a feeling of freedom, which I never found when writing in any other language. I hope that your journey into assembly doesn't end with this article!

Subscribe to newsletter to receive notifications about new articles:

7. References www.win32asm.cjb.net a page for assembly programmers, sources, tutorials, forums www.int80h.org FreeBSA assembly programming www.rbthomas.freeserve.co.uk programming Windows graphics, algorithms, fractals www.chrisdragan.org Chris Dragan's page, many samples in assembler (MMX) www.azillionmonkeys.com/qed/index.html an excellent articles about low level code optimization (MMX, Pentium) asmjournal.freeservers.com Assembly Programming Journal, a computer programming magazine for the assembler language, C libraries code optimization, assembly programming for Unix shells, game programming in assembly with DirectX and many other interesting resources www.nasm.us an official page for the free NASM assembler framework (Windows, Unix) www.borland.com/Products/Software-Testing/Automated-Testing/Devpartner-Studio SoftIce, debugger that let you analyze any application on high and low level formats