开发者

Trusting floating point arithmetic

开发者 https://www.devze.com 2023-04-11 05:37 出处:网络
I know there are problem with numbers like 0.3 which cannot be expressed using floating point numbers so they generate floating point errors.

I know there are problem with numbers like 0.3 which cannot be expressed using floating point numbers so they generate floating point errors.

What about the numbers that can be represented? like 0.5, 0.75, etc... can I trust floating point arithmetic to be error free if I'm dealing with number开发者_如何学JAVAs that are negative powers of two and numbers composed from them?


Assuming you have an IEEE754 architecture, and if you are only performing addition, subtraction and multiplication, and if the result fits, then it should be correct. Division can only be used if the resulting denominator is a power of two. Any other built-in maths functions like exp and log cannot possibly be correct (due to Lindemann-Weierstrass); ditto for non-natural powers (though there isn't even a built-in power function in most CPUs anyway).


There is another obvious restriction: a normal floating point number will have (for example) a 53-bit significand, so (after scaling) the binary representation of every number involved has to fit into 53 binary digits to avoid losing precision.


Some additions and subtractions can be exact in floating-point arithmetic but in general multiplication cannot be free of rounding in floating-point arithmetic because you need double the number of bits to represent the product.


When you have an IEEE-754-conforming floating point implementation you have the guarantee that the basic operations (addition, subtraction, multiplication, division, remainder, square root) are computed as precisely as possible. So you can do all of the following operations safely:

  • 1.0 + 1.0
  • 1.0 - 0.5
  • 0.0 - -0.0
  • 0.16845703125 * 0.16845703125
  • sqrt(4.0)
  • sqrt(20.25)
  • 15.5 / 0.5

In contrast to the other basic operations, the remainder operation is precise for any two operands.

You just have to make sure that you never need more precision than the precision provided by the floating point type.


You need to get a copy of the IEEE floating point spec and study it. Pretty much all compilers and CPUs follow that to the letter these days, so if you follow the spec to the letter you can get "exact" results.

The one thing you don't always (depending on the language) have control over is whether the result of a computation remains in a register or is stored back to "home". This can affect the precision that is carried forward into the next computation.

But just about every common computing language implements (or has available as an add-on) some sort of "long decimal" or "long integer" support that can be used to produce exact results to an arbitrary length/precision, so long as you stick to values that are representable in those forms.


To begin with the implementation of the IEEE specification on the x86 architecture attempts to handle all out-of-the-ordinary situations with exceptions. Divide by zero, underflow and overflow are obvious exceptions. Another, not so obvious, is "inexact" i e the result of an operation cannot be represented exactly. In any case - as I understand it - many development environments simply mask out the exceptions causing such situations to go unnoticed. Even in environments that don't have training wheels the inexact exception tends to be masked out but can, of course, be enabled.

As to the question of negative powers of two the answer is rather that you should make sure that unsafe values don't end up where they don't have any business being. 0 as the divisor, negative values into sqrt or log/ln etc. This implies implementing control over the inputs such that the algorithms don't freak when using them. Since your exceptions will probably be masked your algorithm may have done quite a bit of work with bad values before you're faced with the result: +NAN, -NAN or "expletive deleted"-looking formatting from printf.

Floating point arithmetic brings issues with it which can lead to (and often does) a can of worms. So my recommendation is that you peek some more under the hood and experiment with the values you plug into different fp operations.

These days, it doesn't take much to become a floating point guru.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号