[MLton-user] timing anomoly
Sean McLaughlin
seanmcl at gmail.com
Tue Dec 4 08:48:24 PST 2007
Fair enough :)
However, is it possible to make this code run faster? It currently runs
over 3X slower than the equivalent C program:
(* BEGIN SML *)
fun eval_real_poly n =
let
val msg = n div 10
val x1 = 4.0
val x2 = x1
val x3 = x1
val x4 = x1
val x5 = x1
val x6 = x1
fun eval (0,store) = store
| eval (n,store) =
let in
if n mod msg = 0 then print ("iter: " ^ Int.toString n ^
"\n") else ();
eval (n-1,store + (~x1)*x4 + x2*x5 +(~x3)*x5 +(~x2)*x6 + x3*x6 +
x4*(x2 + x3 + x5 + x6) + x4*(~x1)+x4*(~x4))
end
in
eval (n,0.0)
end
val _ = print (Real.toString (eval_real_poly 500000000) ^ "\n")
(* END SML *)
// BEGIN C
#include <stdio.h>
double eval_real_poly(int n){
int msg = n / 10;
double x1 = 4.0;
double x2 = x1;
double x3 = x1;
double x4 = x1;
double x5 = x1;
double x6 = x1;
double store = 0.0;
int i;
for(i=0;i<n;i++){
if(i % msg == 0) printf("iter: %d\n",i);
store += (-x1)*x4 + x2*x5 +(-x3)*x5 +(-x2)*x6 + x3*x6 +
x4*(x2 + x3 + x5 + x6) + x4*(-x1)+x4*(-x4);
}
return store;
}
int main(){
printf("res: %f,\n",eval_real_poly(500000000));
return 0;
}
// END C
On Dec 4, 2007 10:39 AM, Frank Pfenning <fp at cs.cmu.edu> wrote:
> That's funny :-) I will have to tell the students in my compiler
> class about this later today... - Frank (Sorry, Sean, couldn't resist...)
>
>
>
> On Dec 4, 2007 10:28 AM, Matthew Fluet < fluet at tti-c.org> wrote:
> >
> >
> >
> >
> >
> >
> >
> > On Mon, 3 Dec 2007, Sean McLaughlin wrote:
> > > I found some very strange behavior of mlton while running some
> > > floating point experiments.
> > > The attached file evaluates some polynomials and returns them. If you
> > > multiply a value by a
> > > single argument, it multiplies by 10 the performance time of the
> > > entire function. I'd very much
> > > like to have this kind of code run fast. When the line is commented,
> > > it runs faster than
> > > C++ (hurray!). When uncommented, 10X slower :(
> >
> > As Florian discovered, your 'doit' function (with the additional
> > multiplication) just crosses an inlining threshold. You can discover this
> > by using the '-keep ssa' and '-keep ssa2' options to look at the at the
> > end of the main optimization passes.
> >
> > The default value for the '-inline <N>' compiler option is 60, and using
> > '-inline 65' gets the test program to inline the slightly larger function.
> > You probably don't need to go all the way to '-inline 500' or
> > '-inline 1000'.
> >
> > However, be very careful extrapolating from your timing.sml program to
> > your real application. I don't believe that timing.sml is measuring quite
> > what you think it is measuring. Recall the 'repeat' function:
> >
> > fun repeat_fun f n =
> > let
> > val msg = n div 10
> > fun repeat_fun' f 0 = ()
> > | repeat_fun' f n =
> > let in
> > if n mod msg = 0 then print ("iter: " ^ Int.toString n ^
> "\n") else ();
> > ignore (f ());
> > repeat_fun' f (n-1)
> > end
> > in
> > repeat_fun' f (n-1);
> > f ()
> > end
> >
> > This ignores there result of the call to 'f' in 'repeat_fun'', and only
> > returns the result of the final call to 'f'. When the 'doit' function is
> > inlined, MLton quite happily determines that all the fp arithmetic (since
> > it has no side-effects) that is inserted into the 'repeat_fun' loop can be
> > discarded. So, with the "fast" version, you are executing a nearly empty
> > loop that just occassionally prints a message. With the "slow" version,
> > you are executing the 'doit' arithmetic every iteration of the
> > 'repeat_fun'' loop, so that is the reason you see a 10X slowdown. The
> > actual assembly sequence for the arithmetic is identical (modulo the extra
> > multiplication).
> >
> > Even when the 'doit' function is not inlined, MLton could discard the call
> > to 'doit' in 'reapat_fun''. Since the result of the call is unused, we
> > can discard the call when the called function has no side-effects, only
> > returns normally (i.e., doesn't raise exceptions), and terminates. The
> > 'removeUnused' optimization pass computes a maySideEffect and mayRaise
> > predicate for each function, but does not currently compute a
> > mustTerminate predicate. Thus, the call stays, and you see the longer
> > execution time when 'doit' is not inlined.
> >
> >
> >
> >
> >
> > _______________________________________________
> > MLton-user mailing list
> > MLton-user at mlton.org
> > http://mlton.org/mailman/listinfo/mlton-user
> >
>
>
More information about the MLton-user
mailing list