开发者

Error subtracting floating point numbers when passing through 0.0

开发者 https://www.devze.com 2023-02-28 08:24 出处:网络
The following program: #include <stdio.h> int main() { double val = 1.0; int i; 开发者_运维技巧

The following program:

#include <stdio.h>

int main()
{
    double val = 1.0;
    int i;
开发者_运维技巧
    for (i = 0; i < 10; i++)
    {
        val -= 0.2;
        printf("%g %s\n", val, (val == 0.0 ? "zero" : "non-zero"));
    }

    return 0;
}

Produces this output:

0.8 non-zero
0.6 non-zero
0.4 non-zero
0.2 non-zero
5.55112e-17 non-zero
-0.2 non-zero
-0.4 non-zero
-0.6 non-zero
-0.8 non-zero
-1 non-zero

Can anyone tell me what is causing the error when subtracting 0.2 from 0.2? Is this a rounding error or something else? Most importantly, how do I avoid this error?

EDIT: It looks like the conclusion is to not worry about it, given 5.55112e-17 is extremely close to zero (thanks to @therefromhere for that information).


Its because floating points numbers can not be stored in memory in exact value. So it is never safe to use == in floating point values. Using double will increase the precision, but again that will not be exact. The correct way to compare a floating point value is to do something like this:

val == target;   // not safe

// instead do this
// where EPS is some suitable low value like 1e-7
fabs(val - target) < EPS; 

EDIT: As pointed in the comments, the main reason of the problem is that 0.2 can't be stored exactly. So when you are subtracting it from some value, every time causing some error. If you do this kind of floating point calculation repeatedly then at certain point the error will be noticeable. What I am trying to say is that all floating points values can't be stored, as there are infinites of them. A slight wrong value is not generally noticeable but using that is successive computation will lead to higher cumulative error.


0.2 is not a double precision floating-point number, so it is rounded to the nearest double precision number, which is:

            0.200000000000000011102230246251565404236316680908203125

That's rather unwieldy, so let's look at it in hex instead:

          0x0.33333333333334

Now, let's follow what happens when this value is repeatedly subtracted from 1.0:

          0x1.00000000000000
        - 0x0.33333333333334
        --------------------
          0x0.cccccccccccccc

The exact result is not representable in double precision, so it is rounded, which gives:

          0x0.ccccccccccccd

In decimal, this is exactly:

            0.8000000000000000444089209850062616169452667236328125

Now we repeat the process:

          0x0.ccccccccccccd
        - 0x0.33333333333334
        --------------------
          0x0.9999999999999c
rounds to 0x0.999999999999a
           (0.600000000000000088817841970012523233890533447265625 in decimal)

          0x0.999999999999a
        - 0x0.33333333333334
        --------------------
          0x0.6666666666666c
rounds to 0x0.6666666666666c
           (0.400000000000000077715611723760957829654216766357421875 in decimal)

          0x0.6666666666666c
        - 0x0.33333333333334
        --------------------
          0x0.33333333333338
rounds to 0x0.33333333333338
           (0.20000000000000006661338147750939242541790008544921875 in decimal)

          0x0.33333333333338
        - 0x0.33333333333334
        --------------------
          0x0.00000000000004
rounds to 0x0.00000000000004
           (0.000000000000000055511151231257827021181583404541015625 in decimal)

Thus, we see that the accumulated rounding that is required by floating-point arithmetic produces the very small non-zero result that you are observing. Rounding is subtle, but it is deterministic, not magic, and not a bug. It's worth taking the time to learn about.


Floating point arithmetic cannot represent all numbers exactly. Thus rounding errors like you observe are inevitable.

One possible strategy is to use a fixed point format, e.g. A decimal or currency data type. Such types still can't represent all numbers but would behave as you expect for this example.


To elaborate a bit: if the mantissa of the floating point number is encoded in binary (as is the case in most contemporary FPUs), then only sums of (multiples) of the numbers 1/2, 1/4, 1/8, 1/16, ... can be represented exactly in the mantissa. The value 0.2 is approximated with 1/8 + 1/16 + .... some even smaller numbers, yet the exact value of 0.2 can not be reached with a finite mantissa.

You can try the following:

 printf("%.20f", 0.2);

and you'll (probably) see that what you think is 0.2 is not 0.2 but a number that is a tiny amount different (actually, on my computer it prints 0.20000000000000001110). Now you understand why you can never reach 0.

But if you let val = 12.5 and subtract 0.125 in your loop, you could reach zero.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号