[MLton-user] timing anomoly
Matthew Fluet
fluet at tti-c.org
Tue Dec 4 07:28:12 PST 2007
On Mon, 3 Dec 2007, Sean McLaughlin wrote:
> I found some very strange behavior of mlton while running some
> floating point experiments.
> The attached file evaluates some polynomials and returns them. If you
> multiply a value by a
> single argument, it multiplies by 10 the performance time of the
> entire function. I'd very much
> like to have this kind of code run fast. When the line is commented,
> it runs faster than
> C++ (hurray!). When uncommented, 10X slower :(
As Florian discovered, your 'doit' function (with the additional
multiplication) just crosses an inlining threshold. You can discover this
by using the '-keep ssa' and '-keep ssa2' options to look at the at the
end of the main optimization passes.
The default value for the '-inline <N>' compiler option is 60, and using
'-inline 65' gets the test program to inline the slightly larger function.
You probably don't need to go all the way to '-inline 500' or
'-inline 1000'.
However, be very careful extrapolating from your timing.sml program to
your real application. I don't believe that timing.sml is measuring quite
what you think it is measuring. Recall the 'repeat' function:
fun repeat_fun f n =
let
val msg = n div 10
fun repeat_fun' f 0 = ()
| repeat_fun' f n =
let in
if n mod msg = 0 then print ("iter: " ^ Int.toString n ^ "\n") else ();
ignore (f ());
repeat_fun' f (n-1)
end
in
repeat_fun' f (n-1);
f ()
end
This ignores there result of the call to 'f' in 'repeat_fun'', and only
returns the result of the final call to 'f'. When the 'doit' function is
inlined, MLton quite happily determines that all the fp arithmetic (since
it has no side-effects) that is inserted into the 'repeat_fun' loop can be
discarded. So, with the "fast" version, you are executing a nearly empty
loop that just occassionally prints a message. With the "slow" version,
you are executing the 'doit' arithmetic every iteration of the
'repeat_fun'' loop, so that is the reason you see a 10X slowdown. The
actual assembly sequence for the arithmetic is identical (modulo the extra
multiplication).
Even when the 'doit' function is not inlined, MLton could discard the call
to 'doit' in 'reapat_fun''. Since the result of the call is unused, we
can discard the call when the called function has no side-effects, only
returns normally (i.e., doesn't raise exceptions), and terminates. The
'removeUnused' optimization pass computes a maySideEffect and mayRaise
predicate for each function, but does not currently compute a
mustTerminate predicate. Thus, the call stays, and you see the longer
execution time when 'doit' is not inlined.
More information about the MLton-user
mailing list