FFT benchmark

Sun, 6 Aug 2000 17:57:01 -0500

I  finally  sat  down  and  looked  at  the hot loop you sent me from the FFT
benchmark.  One inefficiency is the repeated references  to  the  same  stack
(MLton stack) location.  I don't know if the C compiler is going to be clever
enough to figure out that it can cache the value stored in  the  MLton  stack
location  in  a  register or in a C stack location.  Given that it doesn't do
any pointer alias analysis, it certainly is going to have to reload them over
any store into the MLton stack.  Even a store into any C-local variable whose
address has ben taken will force this, but you probably never do that.

An example is the reload of SP(256) that will have to happen after the store
    XD(SP(256), SI(308)) = RD(14);
and the reload of SI(308) after the store
    XD(SP(256), RI(8)) = RD(20);
(although the latter is probably pretty cheap).

Also, although the C compiler should be doing it, not the repeated re-loading
of  things from the MLton stack.  As an example, at the beginning of the code
look at

    RD(9) = XD(SP(256), SI(308));
    RD(10) = XD(SP(256), RI(9));
    RD(11) = Real_sub(RD(9), RD(10));
    RD(12) = XD(SP(256), SI(308));
    RD(13) = XD(SP(256), RI(9));
    RD(14) = Real_add(RD(12), RD(13));

I would think that doing this kind of common subexpression elimination  (just
of  the  result  of  loads) would be a good idea any way.  Especially for the
extra-indirection stuff (all the loads above).

I tried to look at the generated C code, but nothing I  ran  on  any  of  the
versions  of the FFT benchmark produced similar C code.  What did you compile
and with what version of MLton?  Could you send me the source you used?

Speaking of FFT's, the FFT code in the benchmark really  seems  insane.   For
one  thing,  the  natural characterization is to go from a complex array to a
complex array, but from what it looks like, they use  2  parallel  arrays  of
double's  instead  (probably because they wanted to avoid the boxing overhead
of an array of pairs of doubles).  Also the result (assuming it  is  supposed
to leave the result in the input arrays) ceratinly isn't the FFT.