Streaming SIMD Extensions (SSE)

SSE — An Overview SSE is a newer SIMD extension to the FPU registers, SSE adds a separate register space to the microprocessor. Because of this, SSE can only be used on operating systems that support it. Fortunately, most recent operating systems have support built in. All versions of Windows since Windows98 support SSE, as do Linux kernels since 2.2.



SSE was introduced in 1999, and was also known as "Katmai New Instructions" (or KNI) after the Pentium III's core codename.



SSE adds 8 new 128-bit registers, divided into 4 32-bit (single precision) floating point values. These registers are called XMM0 - XMM7 . An additional control register, MXCSR , is also available to control and check the status of SSE instructions.



SSE gives us access to 70 new instructions that operate on these 128bit registers, MMX registers, and sometimes even regular 32bit registers. is a newer SIMD extension to the Intel Pentium III and AMD AthlonXP microprocessors. Unlike MMX and 3DNow! extensions, which occupy the same register space as the normalregisters, SSE adds a separate register space to the microprocessor. Because of this, SSE can only be used on operating systems that support it. Fortunately, most recent operating systems have support built in. All versions of Windows since Windows98 support SSE, as do Linux kernels since 2.2.SSE was introduced in 1999, and was also known as "Katmai New Instructions" (or KNI) after the Pentium III's core codename.SSE adds 8 new 128-bit registers, divided into 4 32-bit (single precision) floating point values. These registers are called. An additional control register,, is also available to control and check the status of SSE instructions.SSE gives us access to 70 new instructions that operate on these 128bit registers, MMX registers, and sometimes even regular 32bit registers.

SSE — MXCSR The MXCSR register is a 32-bit register containing flags for control and status information regarding SSE instructions. As of



Pnemonic Bit Location Description FZ bit 15 Flush To Zero R+ bit 14 Round Positive R- bit 13 Round Negative RZ bits 13 and 14 Round To Zero RN bits 13 and 14 are 0 Round To Nearest PM bit 12 Precision Mask UM bit 11 Underflow Mask OM bit 10 Overflow Mask ZM bit 9 Divide By Zero Mask DM bit 8 Denormal Mask IM bit 7 Invalid Operation Mask DAZ bit 6 Denormals Are Zero PE bit 5 Precision Flag UE bit 4 Underflow Flag OE bit 3 Overflow Flag ZE bit 2 Divide By Zero Flag DE bit 1 Denormal Flag IE bit 0 Invalid Operation Flag



FZ mode causes all underflowing operations to simply go to zero. This saves some processing time, but loses precision.



The R+ , R- , RN , and RZ rounding modes determine how the lowest bit is generated. Normally, RN is used.



PM , UM , MM , ZM , DM , and IM are masks that tell the processor to ignore the exceptions that happen, if they do. This keeps the program from having to deal with problems, but might cause invalid results.



DAZ tells the CPU to force all Denormals to zero. A Denormal is a number that is so small that FPU can't renormalize it due to limited exponent ranges. They're just like normal numbers, but they take considerably longer to process. Note that not all processors support DAZ .



PE , UE , ME , ZE , DE , and IE are the exception flags that are set if they happen, and aren't unmasked. Programs can check these to see if something interesting happened. These bits are "sticky", which means that once they're set, they stay set forever until the program clears them. This means that the indicated exception could have happened several operations ago, but nobody bothered to clear it.



DAZ wasn't available in the first version of SSE. Since setting a reserved bit in MXCSR causes a general protection fault, we need to be able to check the availability of this feature without causing problems. To do this, one needs to set up a 512-byte area of memory to save the SSE state to, using fxsave , and then one needs to inspect bytes 28 through 31 for the MXCSR_MASK value. If bit 6 is set, DAZ is supported, otherwise, it isn't. Theregister is a 32-bit register containing flags for control and status information regarding SSE instructions. As of SSE3 , only bits 0-15 have been defined.mode causes all underflowing operations to simply go to zero. This saves some processing time, but loses precision.The, androunding modes determine how the lowest bit is generated. Normally,is used., andare masks that tell the processor to ignore the exceptions that happen, if they do. This keeps the program from having to deal with problems, but might cause invalid results.tells theto force all Denormals to zero. A Denormal is a number that is so small thatcan't renormalize it due to limited exponent ranges. They're just like normal numbers, but they take considerably longer to process. Note that not all processors support, andare the exception flags that are set if they happen, and aren't unmasked. Programs can check these to see if something interesting happened. These bits are "sticky", which means that once they're set, they stay set forever until the program clears them. This means that the indicated exception could have happened several operations ago, but nobody bothered to clear it.wasn't available in the first version of SSE. Since setting a reserved bit incauses a general protection fault, we need to be able to check the availability of this feature without causing problems. To do this, one needs to set up a 512-byte area of memory to save the SSE state to, using, and then one needs to inspect bytes 28 through 31 for thevalue. If bit 6 is set,is supported, otherwise, it isn't.