Pascal's answer is OK - but lacks details which entails that some users do not get it ;-) . If you are interested in how it looks on lower level (assuming coprocessor and not software handles floating point operations) - read on.

In 32 bits of float (IEEE 754) you can store all of integers from within [-224...224] range. Integers outside the range may also have exact representation as float but not all of them have. The problem is that you can have only 24 significant bits to play with in float.

Here is how conversion from int->float typically looks like on low level:

fild dword ptr[your int] fstp dword ptr[your float]

It is just sequence of 2 coprocessor instructions. First loads 32bit int onto comprocessor's stack and converts it into 80 bit wide float.

Intel® 64 and IA-32 Architectures Software Developer’s Manual (PROGRAMMING WITH THE X87 FPU): When floating-point, integer, or packed BCD integer values are loaded from memory into any of the x87 FPU data registers, the values are automatically converted into double extended-precision floating-point format (if they are not already in that format).

Since FPU registers are 80bit wide floats - there is no issue with fild here as 32bit int perfectly fits in 64bit significand of floating point format.

So far so good.

The second part - fstp is bit tricky and may be surprising. It is supposed to store 80bit floating point in 32bit float. Although it is all about integer values (in the question) coprocessor may actually perform 'rounding'. Ke? How do you round integer value even if it is stored in floating point format? ;-).

I'll explain it shortly - let's first see what rounding modes x87 provides (they are IEE 754 rounding modes' incarnation). X87 fpu has 4 rounding modes controlled by bits #10 and #11 of fpu's control word:

00 - to nearest even - Rounded result is the closest to the infinitely precise result. If two values are equally close, the result is the even value (that is, the one with the least-significant bit of zero). Default

01 - toward -Inf

10 - toward +inf

11 - toward 0 (ie. truncate)

You can play with rounding modes using this simple code (although it may be done differently - showing low level here):

enum ROUNDING_MODE { RM_TO_NEAREST = 0x00, RM_TOWARD_MINF = 0x01, RM_TOWARD_PINF = 0x02, RM_TOWARD_ZERO = 0x03 // TRUNCATE }; void set_round_mode(enum ROUNDING_MODE rm) { short csw; short tmp = rm; _asm { push ax fstcw [csw] mov ax, [csw] and ax, ~(3<<10) shl [tmp], 10 or ax, tmp mov [csw], ax fldcw [csw] pop ax } }

Ok nice but still how is that related to integer values? Patience ... to understand why you might need rounding modes involved in int to float conversion check most obvious way of converting int to float - truncation (not default) - that may look like this:

record sign

negate your int if less than zero

find position of leftmost 1

shift int to the right/left so that 1 found above is positioned on bit #23

record number of shifts during the process so that you can calculate exponent

And the code simulating this bahavior may look like this:

float int2float(int value) { // handles all values from [-2^24...2^24] // outside this range only some integers may be represented exactly // this method will use truncation 'rounding mode' during conversion // we can safely reinterpret it as 0.0 if (value == 0) return 0.0; if (value == (1U<<31)) // ie -2^31 { // -(-2^31) = -2^31 so we'll not be able to handle it below - use const value = 0xCF000000; return *((float*)&value); } int sign = 0; // handle negative values if (value < 0) { sign = 1U << 31; value = -value; } // although right shift of signed is undefined - all compilers (that I know) do // arithmetic shift (copies sign into MSB) is what I prefer here // hence using unsigned abs_value_copy for shift unsigned int abs_value_copy = value; // find leading one int bit_num = 31; int shift_count = 0; for(; bit_num > 0; bit_num--) { if (abs_value_copy & (1U<<bit_num)) { if (bit_num >= 23) { // need to shift right shift_count = bit_num - 23; abs_value_copy >>= shift_count; } else { // need to shift left shift_count = 23 - bit_num; abs_value_copy <<= shift_count; } break; } } // exponent is biased by 127 int exp = bit_num + 127; // clear leading 1 (bit #23) (it will implicitly be there but not stored) int coeff = abs_value_copy & ~(1<<23); // move exp to the right place exp <<= 23; int ret = sign | exp | coeff; return *((float*)&ret); }

Now example - truncation mode converts 2147483583 to 2147483520 .

2147483583 = 01111111_11111111_11111111_10111111

During int->float conversion you must shift leftmost 1 to bit #23. Now leading 1 is in bit#30. In order to place it in bit #23 you must perform right shift by 7 positions. During that you loose (they will not fit in 32bit float format) 7 lsb bits from the right (you truncate/chop). They were:

01111111 = 63

And 63 is what original number lost:

2147483583 -> 2147483520 + 63

Truncating is easy but may not necessarily be what you want and/or is best for all cases. Consider below example:

67108871 = 00000100_00000000_00000000_00000111

Above value cannot be exactly represented by float but check what truncation does to it. As previously - we need to shift leftmost 1 to bit #23. This requires value to be shifted right exactly 3 positions loosing 3 LSB bits (as of now I'll write numbers differently showing where implicit 24th bit of float is and will bracket explicit 23bits of significand):

00000001.[0000000_00000000_00000000] 111 * 2^26 (3 bits shifted out)

Truncation chops 3 trailing bits leaving us with 67108864 (67108864+7(3 chopped bits)) = 67108871 (remember although we shift we compensate with exponent manipulation - omitted here).

Is that good enough? Hey 67108872 is perfectly representable by 32bit float and should be much better than 67108864 right? CORRECT and this is where you might want to talk about rounding when converting int to 32bit float.

Now let's see how default 'rounding to nearest even' mode works and what are its implications in OP's case. Consider the same example one more time.

67108871 = 00000100_00000000_00000000_00000111

As we know we need 3 right shifts to place leftmost 1 in bit #23:

00000000_1.[0000000_00000000_00000000] 111 * 2^26 (3 bits shifted out)

Procedure of 'rounding to nearest even' involves finding 2 numbers that bracket input value 67108871 from bottom and above as close as possible. Keep in mind that we still operate within FPU on 80bits so although I show some bits being shifted out they are still in FPU reg but will be removed during rounding operation when storing output value.

00000000_1.[0000000_00000000_00000000] 111 * 2^26 (3 bits shifted out)

2 values that closely bracket 00000000_1.[0000000_00000000_00000000] 111 * 2^26 are:

from top:

00000000_1.[0000000_00000000_00000000] 111 * 2^26 +1 = 00000000_1.[0000000_00000000_00000001] * 2^26 = 67108872

and from below:

00000000_1.[0000000_00000000_00000000] * 2^26 = 67108864

Obviously 67108872 is much closer to 67108871 than 67108864 hence conversion from 32bit int value 67108871 gives 67108872 (in rounding to nearest even mode).

Now OP's numbers (still rounding to nearest even):

2147483583 = 01111111_11111111_11111111_10111111 = 00000000_1.[1111111_11111111_11111111] 0111111 * 2^30

bracket values:

top:

00000000_1.[1111111_111111111_11111111] 0111111 * 2^30 +1 = 00000000_10.[0000000_00000000_00000000] * 2^30 = 00000000_1.[0000000_00000000_00000000] * 2^31 = 2147483648

bottom:

00000000_1.[1111111_111111111_11111111] * 2^30 = 2147483520

Keep in mind that even word in 'rounding to nearest even' matters only when input value is halfway between bracket values. Only then word even matters and 'decides' which bracket value should be selected. In the above case even does not matter and we must simply choose nearer value, which is 2147483520

Last OP's case shows the problem where even word matters. :

2147483584 = 01111111_11111111_11111111_11000000 = 00000000_1.[1111111_11111111_11111111] 1000000 * 2^30

bracket values are the same as previously:

top: 00000000_1.[0000000_00000000_00000000] * 2^31 = 2147483648

bottom: 00000000_1.[1111111_111111111_11111111] * 2^30 = 2147483520

There is no nearer value now (2147483648-2147483584=64=2147483584-2147483520) so we must rely on even and select top (even) value 2147483648 .

And here OP's problem is that Pascal had briefly described. FPU works only on signed values and 2147483648 cannot be stored as signed int as its max value is 2147483647 hence issues.

Simple proof (without documentation quotes) that FPU works only on signed values ie. treats every value as signed is by debugging this:

unsigned int test = (1u << 31); _asm { fild [test] }

Although it looks like test value should be treated as unsigned it will be loaded as -231 as there is no separate instructions for loading signed and unsigned values into FPU. Likewise you'll not find instructions that will allow you to store unsigned value from FPU to mem. Everything is just a bit pattern treated as signed regardless of how you might have declared it in your program.

Was long but hope someone will learn something out of it.