IEEE 754 Floating Point Add/Rounding_问答_开发者

开发者 https://www.devze.com 2023-04-10 05:53 出处：网络

I don\'t understand how I can add in IEEE 754 Floating Point (mainly the \"re-alignment\" of the exponent)

相关专题：ieee-754 math

I don't understand how I can add in IEEE 754 Floating Point (mainly the "re-alignment" of the exponent)

Also for rounding, how does the Guard, Round & Sticky come into play? How to do rounding in general (base 2 floats that is)

eg. Suppose the qn: Add IEEE 754 Float represented in hex 0x383FFBAD and 0x3FD0ECCD, then give answers in Round to 0, \$\pm \infty\$, nearest

So I have

0x383FFBAD        0 | 0111 0000 | 0开发者_高级运维111 1111 1111 0111 0101 1010
0x3FD0ECCD        0 | 0111 1111 | 1010 0001 1101 1001 1001 1010

Then how should I continue? Feel free to use other examples if you wish

If I understood your "re-alignment" of the exponent correctly...

Here's an explanation of how the format relates to the actual value.

1.b(22)b(21)...b(0) * 2^e-127 can be interpreted as a binary integer shifted left by e-127 bit positions. Of course, the shift amount can be negative, which is how we get fractions (values between 0 and 1).

In order to add 2 floating-point numbers of the same sign you need to first have their exponent part equal, or, in other words, denormalize one of the addends (the one with the smaller exponent).

The reason is very simple. When you add, for example, 1 thousand and 1 you want to add tens with tens, hundreds with hundreds, etc. So, you have 1.000*10³ + 1.000*10⁰ = 1.000*10³ + 0.001*10³(<--denormalized) = 1.001*10³. This can, of course, result in truncation/rounding, if the format cannot represent the result exactly (e.g. if it could only have 2 significant digits, you'd end up with the same 1.0*10³ for the sum).

So, just like in the above example with 1000 and 1, you may need to shift to the right one of the addends before adding their mantissas. You need to remember that there's an implict 1. bit in the format, which isn't stored in the float, which you have to account for when shifting and adding. After adding the mantissas, you most likely will run into a mantissa overflow and will have to denormalize again to get rid of the overflow.

That's the basics. There're special cases to consider as well.