non-tail returns

Sun, 28 Oct 2001 14:51:47 -0500 (EST)

> > I agree with your conclusions.  I also observe from those numbers that
> > I really want -native-live-stack true.  Those are some nice speedups.
> > Maybe we could put some hack in that uses -native-live-stack true for
> > programs with fewer than N statements (where N =~ 10000)?
> 
> We could try it.  I haven't tried a self compile with -native-live-stack
> true in a while, so I'll see how slow it is.

I make a profiling G1 and tried a self compile with -native-live-stack
false and -native-live-stack true.  The false case has:

sh-2.04$ mlprof -t 1 -d 1 ../../lib/mlton-compile mlmon.out 
642.24 seconds of CPU time

and the true case has:

sh-2.04$ mlprof -t 1 -d 1 ../../lib/mlton-compile mlmon.out 
946.89 seconds of CPU time

Most of the expensive functions are in the x86-codegen.  And, the hot
spots mostly seem to be List.peek/exists/contains type loops.  These
should probably be turned into some sort of property list; I'm thinking of
implementing the Label.t * MemLoc.t hash table of property lists.  But,
one thing to note is that some of these loops contain loop-invariant
selects that haven'e been hoisted:

  loop_2763 (x_66289)
    x_66299 = MLton_eq (x_66289, x_66300)
    case x_66299 of
      false => L_29447 | true => L_29448
  L_29447 ()
    x_66293 = Vector_sub (x_66298, x_66289)
    x_66297 = #5 x_66293
    x_66295 = #4 x_66287
    x_66296 = #4 x_66297
    x_66294 = MLton_eq (x_66296, x_66295)
    case x_66294 of
      false => L_29443 | true => L_29445
  L_29443 ()
    L_108466(global_1 + x_66289) Overflow => L_29444()
  L_108466 (x_66288)
    loop_2763 (x_66288)

The #4 x_66297 looks like it could be hoisted out of this loop.

Another suprisingly hot function is nest, in loopForest in
directed-graph.sml.  I suspect that what is most expensive is the inlined
call to subGraph.  The implementation of loopForest is pretty naive, taken
directly from the paper.  Something that might be missing from the whole
directed graph library is a notion of active/inactive nodes and edges,
which would let us operate on sub-graphs without explicitly building a new
graph.