-->

`std::sin` is wrong in the last bit

2020-08-20 11:18发布

问题:

I am porting some program from Matlab to C++ for efficiency. It is important for the output of both programs to be exactly the same (**).

I am facing different results for this operation:

std::sin(0.497418836818383950)   = 0.477158760259608410 (C++)
sin(0.497418836818383950)        = 0.47715876025960846000 (Matlab)
N[Sin[0.497418836818383950], 20] = 0.477158760259608433 (Mathematica)

So, as far as I know both C++ and Matlab are using IEEE754 defined double arithmetic. I think I have read somewhere that IEEE754 allows differents results in the last bit. Using mathematica to decide, seems like C++ is more close to the result. How can I force Matlab to compute the sin with precision to the last bit included, so that the results are the same?

In my program this behaviour leads to big errors because the numerical differential equation solver keeps increasing this error in the last bit. However I am not sure that C++ ported version is correct. I am guessing that even if the IEEE754 allows the last bit to be different, somehow guarantees that this error does not get bigger when using the result in more IEEE754 defined double operations (because otherwise, two different programs correct according to the IEEE754 standard could produce completely different outputs). So the other question is Am I right about this?

I would like get an answer to both bolded questions. Edit: The first question is being quite controversial, but is the less important, can someone comment about the second one?

Note: This is not an error in the printing, just in case you want to check, this is how I obtained these results:

http://i.imgur.com/cy5ToYy.png

Note (**): What I mean by this is that the final output, which are the results of some calculations showing some real numbers with 4 decimal places, need to be exactly the same. The error I talk about in the question gets bigger (because of more operations, each of one is different in Matlab and in C++) so the final differences are huge) (If you are curious enough to see how the difference start getting bigger, here is the full output [link soon], but this has nothing to do with the question)

回答1:

Firstly, if your numerical method depends on the accuracy of sin to the last bit, then you probably need to use an arbitrary precision library, such as MPFR.

The IEEE754 2008 standard doesn't require that the functions be correctly rounded (it does "recommend" it though). Some C libms do provide correctly rounded trigonometric functions: I believe that the glibc libm does (typically used on most linux distributions), as does CRlibm. Most other modern libms will provide trig functions that are within 1 ulp (i.e. one of the two floating point values either side of the true value), often termed faithfully rounded, which is much quicker to compute.

None of those values you printed could actually arise as IEEE 64bit floating point values (even if rounded): the 3 nearest (printed to full precision) are:

0.477158760259608 405451814405751065351068973541259765625

0.477158760259608 46096296563700889237225055694580078125

0.477158760259608 516474116868266719393432140350341796875

The possible values you could want are:

  1. The exact sin of the decimal .497418836818383950, which is

0.477158760259608 433132061388630377105954125778369485736356219...

(this appears to be what Mathematica gives).

  1. The exact sin of the 64-bit float nearest .497418836818383950:

0.477158760259608 430531153841011107415427334794384396325832953...

In both cases, the first of the above list is the nearest (though only barely in the case of 1).



回答2:

The sine of the double constant you wrote is about 0x1.e89c4e59427b173a8753edbcb95p-2, whose nearest double is 0x1.e89c4e59427b1p-2. To 20 decimal places, the two closest doubles are 0.47715876025960840545 and 0.47715876025960846096.

Perhaps Matlab is displaying a truncated value? (EDIT: I now see that the fourth-last digit is a 6, not a 0. Matlab is giving you a result that's still faithfully-rounded, but it's the farther of the two closest doubles to the desired result. And it's still printing out the wrong number.

I should also point out that Mathematica is probably trying to solve a different problem---compute the sine of the decimal number 0.497418836818383950 to 20 decimal places. You should not expect this to match either the C++ code's result or Matlab's result.