[MLton-user] timing anomoly
Matthew Fluet
fluet at tti-c.org
Wed Dec 5 06:18:50 PST 2007
On Tue, 4 Dec 2007, Florian Weimer wrote:
> * Sean McLaughlin:
>
>> However, is it possible to make this code run faster? It currently runs
>> over 3X slower than the equivalent C program:
>
> GCC seems to be better at simplifying the polynomial, for some reason.
And GCC seems to significantly alter the meaning of the program.
> After substituiting x1 .. x6 with x,
>
> (-x1)*x4 + x2*x5 +(-x3)*x5 +(-x2)*x6 + x3*x6 +
> x4*(x2 + x3 + x5 + x6) + x4*(-x1)+x4*(-x4)
But, the original expression began with store + ..., and the actual
expression is associated like:
(...((((store + (-x1)*x4) + x2*x5) + (-x3)*x5) + (-x2)*x6) + ...)
FP arithemetic is not associative, so it is not really justified (or at
least numerically accurate) to reassociate the computation for constant
folding. I could see constant folding the idividual negations and
multiplications, but the additions really ought not be.
> becomes
>
> (-x)*x + x*x +(-x)*x +(-x)*x + x*x +
> x*(x + x + x + x) + x*(-x)+x*(-x)
>
> which is basically
>
> x*x
x*x + x*x (I think).
> (if I'm not mistaken). I don't know why MLton doesn't perform the
> transformation. I haven't seen floating point semantics for SML, but
> the optimization certainly result in different results when intermediate
> values would overflow. This seems contrary to the ML spirit, but I fear
> it's essential for decent floating-point performance in some cases (in
> the sense that you may calculate the expression in a way that brings it
> closer to the real result).
MLton could do (very slightly) better, by doing CSE on the straightline
code, but I don't agree that the reassociative constant folding is
correct.
More information about the MLton-user
mailing list