non-tail returns
Matthew Fluet
fluet@CS.Cornell.EDU
Sun, 28 Oct 2001 14:51:47 -0500 (EST)
> > I agree with your conclusions. I also observe from those numbers that
> > I really want -native-live-stack true. Those are some nice speedups.
> > Maybe we could put some hack in that uses -native-live-stack true for
> > programs with fewer than N statements (where N =~ 10000)?
>
> We could try it. I haven't tried a self compile with -native-live-stack
> true in a while, so I'll see how slow it is.
I make a profiling G1 and tried a self compile with -native-live-stack
false and -native-live-stack true. The false case has:
sh-2.04$ mlprof -t 1 -d 1 ../../lib/mlton-compile mlmon.out
642.24 seconds of CPU time
and the true case has:
sh-2.04$ mlprof -t 1 -d 1 ../../lib/mlton-compile mlmon.out
946.89 seconds of CPU time
Most of the expensive functions are in the x86-codegen. And, the hot
spots mostly seem to be List.peek/exists/contains type loops. These
should probably be turned into some sort of property list; I'm thinking of
implementing the Label.t * MemLoc.t hash table of property lists. But,
one thing to note is that some of these loops contain loop-invariant
selects that haven'e been hoisted:
loop_2763 (x_66289)
x_66299 = MLton_eq (x_66289, x_66300)
case x_66299 of
false => L_29447 | true => L_29448
L_29447 ()
x_66293 = Vector_sub (x_66298, x_66289)
x_66297 = #5 x_66293
x_66295 = #4 x_66287
x_66296 = #4 x_66297
x_66294 = MLton_eq (x_66296, x_66295)
case x_66294 of
false => L_29443 | true => L_29445
L_29443 ()
L_108466(global_1 + x_66289) Overflow => L_29444()
L_108466 (x_66288)
loop_2763 (x_66288)
The #4 x_66297 looks like it could be hoisted out of this loop.
Another suprisingly hot function is nest, in loopForest in
directed-graph.sml. I suspect that what is most expensive is the inlined
call to subGraph. The implementation of loopForest is pretty naive, taken
directly from the paper. Something that might be missing from the whole
directed graph library is a notion of active/inactive nodes and edges,
which would let us operate on sub-graphs without explicitly building a new
graph.