[MLton-user] timing anomoly

Wed Dec 5 06:18:50 PST 2007

On Tue, 4 Dec 2007, Florian Weimer wrote:
> * Sean McLaughlin:
>
>> However, is it possible to make this code run faster?  It currently runs
>> over 3X slower than the equivalent C program:
>
> GCC seems to be better at simplifying the polynomial, for some reason.

And GCC seems to significantly alter the meaning of the program.

> After substituiting x1 .. x6 with x,
>
>  (-x1)*x4 + x2*x5 +(-x3)*x5 +(-x2)*x6 + x3*x6 +
>      x4*(x2 + x3 + x5 + x6) + x4*(-x1)+x4*(-x4)

But, the original expression began with  store + ..., and the actual 
expression is associated like:

  (...((((store + (-x1)*x4) + x2*x5) + (-x3)*x5) + (-x2)*x6) + ...)

FP arithemetic is not associative, so it is not really justified (or at 
least numerically accurate) to reassociate the computation for constant 
folding.  I could see constant folding the idividual negations and 
multiplications, but the additions really ought not be.

> becomes
>
>  (-x)*x + x*x +(-x)*x +(-x)*x + x*x +
>     x*(x + x + x + x) + x*(-x)+x*(-x)
>
> which is basically
>
>  x*x

x*x + x*x (I think).

> (if I'm not mistaken).  I don't know why MLton doesn't perform the
> transformation.  I haven't seen floating point semantics for SML, but
> the optimization certainly result in different results when intermediate
> values would overflow.  This seems contrary to the ML spirit, but I fear
> it's essential for decent floating-point performance in some cases (in
> the sense that you may calculate the expression in a way that brings it
> closer to the real result).

MLton could do (very slightly) better, by doing CSE on the straightline 
code, but I don't agree that the reassociative constant folding is 
correct.