80 bit reals
Henry Cejtin
henry@sourcelight.com
Tue, 9 Oct 2001 16:28:20 -0500
I did some quick tests of the 80-bit doubles in C, and it definitely hurts a
bit. One problem is that there doesn't seem to be (or at least GCC never
uses) any instructions which combine loading and an operation for this
format. E.g., you can do a load-and-add for 32 and 64 bit reals, but not for
80 bit reals. Thus you have to do the load as a separate instruction.
So my test code was to add up an array of numbers. I made the the array
contain 100 numbers so everything fits in the L-1 cache. For 32-bit reals
the hot loop is
loop:
fadds (%eax) ; add next array element
addl $4, %eax ; advance pointer in array
cmpl %edx, %eax ; compare to end of array
jne loop
and the 64-bit case is the same except the first 2 instructions changes to
faddl (%eax)
addl $8, %eax
For 80-bit reals the loop changes to
loop:
fldt (%eax)
addl $12, %eax
cmpl %edx, %eax
faddp %st, %st(1)
jne loop
Note the strange ordering of the instructions seems to be GCC trying to hide
the latency of the load, but I tried getting rid of it and it made no
difference on my machine (400 MHz Pentium II).
Another disadvantage of the 80-bit reals is that their stored size (12 bytes)
isn't a multiple of 8, which we know causes efficiency problems, so I also
tried padding array elements to 16 bytes. Of course this costs you D-cache
space. Using this the inner loop looks the same except the second
instruction adds 16 to %eax.
Ok, the timings:
32 bits 7.82 nanoseconds per loop
64 bits 7.83 nanoseconds per loop
96 bits 12.85 nanoseconds per loop
128 bits 10.59 nanoseconds per loop
So even if we are willing to add 4 byts of padding, it costs us at least 35%.
And of course the extra space used, and hence the fact that D-cache's will
get filled sooner.
Not horrible, but definitely not great. It will make us look worse in
benchmarks.