80 bit reals

Tue, 9 Oct 2001 16:28:20 -0500

I  did some quick tests of the 80-bit doubles in C, and it definitely hurts a
bit.  One problem is that there doesn't seem to be (or  at  least  GCC  never
uses)  any  instructions  which  combine  loading  and  an operation for this
format.  E.g., you can do a load-and-add for 32 and 64 bit reals, but not for
80 bit reals.  Thus you have to do the load as a separate instruction.

So  my  test  code  was  to add up an array of numbers.  I made the the array
contain 100 numbers so everything fits in the L-1 cache.   For  32-bit  reals
the hot loop is

    loop:
            fadds   (%eax)          ; add next array element
            addl    $4, %eax        ; advance pointer in array
            cmpl    %edx, %eax      ; compare to end of array
            jne     loop

and the 64-bit case is the same except the first 2 instructions changes to
    faddl   (%eax)
    addl    $8, %eax

For 80-bit reals the loop changes to

    loop:
            fldt    (%eax)
            addl    $12, %eax
            cmpl    %edx, %eax
            faddp   %st, %st(1)
            jne     loop

Note  the strange ordering of the instructions seems to be GCC trying to hide
the latency of the load, but I tried  getting  rid  of  it  and  it  made  no
difference on my machine (400 MHz Pentium II).

Another disadvantage of the 80-bit reals is that their stored size (12 bytes)
isn't a multiple of 8, which we know causes efficiency problems,  so  I  also
tried  padding  array elements to 16 bytes.  Of course this costs you D-cache
space.   Using  this  the  inner  loop  looks  the  same  except  the  second
instruction adds 16 to %eax.

Ok, the timings:

     32 bits         7.82 nanoseconds per loop
     64 bits         7.83 nanoseconds per loop
     96 bits        12.85 nanoseconds per loop
    128 bits        10.59 nanoseconds per loop

So even if we are willing to add 4 byts of padding, it costs us at least 35%.
And of course the extra space used, and hence the fact  that  D-cache's  will
get filled sooner.

Not  horrible,  but  definitely  not  great.   It  will make us look worse in
benchmarks.