I've discovered what appears to be a bug in the Opteron implementation
of the FYL2X instruction.
Consider the attached C code. It computes the natural logarithm of 6,
printing it as decimal and the raw 80-bit hex bytes (little endian).
When run on an Opteron Processor 270, it produces the following output:
decimal: 1.791759469228055000897606441335
hex: 0x92110051d15f58e5ff3f
When run on an Intel Xeon, it instead produces:
decimal: 1.791759469228055000789186224086
hex: 0x91110051d15f58e5ff3f
The hex values are little endian. The difference is in the least
significant two bits (first byte); 0x92 vs 0x91.
Comparing the decimal values to the output of 'bc':
$ echo 'scale=30; l(6)' | bc -l
we see:
AMD Opteron: 1.791759469228055000897606441335
bc: 1.791759469228055000812477358380
Intel Xeon: 1.791759469228055000789186224086
The Intel answer is closest to the exact answer, so is the correct
answer in the default (and current) "round to nearest" mode.
Tracing the assembly code for this program, it boils down to a call to
the FYL2X instruction. For both processors, the FP operand stack
before the call contains the top two values (the hex here is big
endian):
R7: Valid 0x3ffeb17217f7d1cf79ac +0.6931471805599453094
R6: Valid 0x4001c000000000000000 +6
R7 is ln(2). After executing FYL2X, those values are replaced with
the result value, which in the case of the Opteron is wrong (value
shown above).
This bug is annoying because it means the calculations I'm doing are
hardware-dependent. The larger system in which these calculations
appear flags non-determinism in an attempt to catch software bugs, but
it's flagging this difference in behavior too, which is introducing
noise.
Anyone else seeing this behavior? Does anyone know of a more official
AMD channel to report this? (I doubt my hardware OEM is going to care.)
Code:
#include <math.h> // logl
#include <stdio.h> // printf
int main()
{
long double d = logl((long double)6);
unsigned char *p = (unsigned char *)(&d);
int i;
printf("decimal: %.30Lf\n", d);
printf("hex: 0x");
for (i=0; i < 10; i++) {
printf("%02x", (int)p[i]);
}
printf("\n");
return 0;
}