[MLton] profiling and tail calls

Stephen Weeks MLton@mlton.org
Fri, 19 Nov 2004 09:57:56 -0800


> I note that lib/cml/tests/ping-pong.mlb runs in constant stack space
> without profiling, but runs with >= O(n) stack space with profiling
> (of any sort).

The problem is that MLton proves that the pong loop never terminates.
This causes the normal return case of the call to be a Bug transfer
instead of Leave followed by a Return transfer.  Hence, the recursive
should-be-tail call to loop doesn't match the pattern of the
optimization that I showed earlier, where the SSA shrinker looks for a
nontail call where both the normal and exceptional continuation are
only profiling statements followed by a Return or Raise respectively.

Here's a simple program that demonstrates the problem.

----------------------------------------------------------------------
val n = valOf (Int.fromString (hd (CommandLine.arguments ())))
exception Done
fun loop n =
   if n = 0
      then raise Done
   else loop (n - 1)
val () = loop n
----------------------------------------------------------------------

Compiling this with

	mlton -profile time -keep ssa2 -keep dot z.sml

yields a program that takes O(n) space.  Here's the offending Ssa2 and
corresponding dot file.  The Bug transfer in L_370 causes the problem.

----------------------------------------------------------------------
fun loop_6 (x_312) = L_238 ()
  L_238 ()
    Enter loop z.sml: 5
    Enter = <basis>/misc/primitive.sml: 13
    x_313 = Word32_equal (x_312, global_2)
    Leave = <basis>/misc/primitive.sml: 13
    case x_313 of
      false => L_240 | true => L_239
  L_239 ()
    Leave loop z.sml: 5
    raise (global_28)
  L_240 ()
    L_241 (x_312 - global_16) Overflow => L_242 ()
  L_241 (x_314)
    loop_6 (x_314) NonTail {cont = L_370, handler = Handle L_244}
  L_244 (x_315)
    Leave loop z.sml: 5
    raise (x_315)
  L_370 ()
    Bug
  L_242 ()
    Leave loop z.sml: 5
    raise (global_29)
----------------------------------------------------------------------
digraph "loop_6 control-flow graph" {
label = "loop_6 control-flow graph"; { rank  = "min"; n0 }
n1 [fontcolor = "Black", shape = "box", label = "L_244 (x_315)\lLeave loop z.sml: 5\lraise (x_315)\l"]
n2 [fontcolor = "Black", shape = "box", label = "L_370 ()\lbug\l"]
n3 [fontcolor = "Black", shape = "box", label = "L_242 ()\lLeave loop z.sml: 5\lraise (global_29)\l"]
n4 [fontcolor = "Black", shape = "box", label = "L_241 (x_314)\lloop_6 (x_314)\l"]
n4 -> n1 [label = "\n", style = "dashed"]
n4 -> n2 [label = "\n", style = "dotted"]
n5 [fontcolor = "Black", shape = "box", label = "L_239 ()\lLeave loop z.sml: 5\lraise (global_28)\l"]
n6 [fontcolor = "Black", shape = "box", label = "L_240 ()\lx_312 - 0x1\l"]
n6 -> n3 [label = "Overflow\n", style = "dashed"]
n6 -> n4 [label = "\n", style = "solid"]
n0 [fontcolor = "Black", shape = "box", label = "L_238 ()\lEnter loop z.sml: 5\lEnter = <basis>/misc/primitive.sml: 13\lx_313 = Word32_equal (x_312, global_2)\lLeave = <basis>/misc/primitive.sml: 13\lcase x_313\l"]
n0 -> n5 [label = "true\n", style = "solid"]
n0 -> n6 [label = "false\n", style = "solid"]
}
------------------------------------------------------------

There's a relevant note in the CVS log for revision 1.35 of
ssa/shrink.fun that shows that the optimization used to be done to
turn such a nontail call into a tail call, but that that caused
another problem.  Here's the log entry.

----------------------------------------------------------------------
Fixed a bug in the shrinker that only showed up when profiling.
Thanks to Joe Hurd for the bug report.  The problem is that any
profiling stack is acceptable at a Bug transfer, but the stack must be
empty at a Tail call.  However, the shrinker changed a nontail call
with a Bug continuation into a tail call, which caused checkProf to
flag a bug.  Here was the offending SSA.

  L_245 ()
    x_555 () NonTail {cont = L_6312, handler = Handle L_242}
  L_6312 ()
    Bug
  L_242 (x_548)
    Leave Vector.V.tabulate <basis>/arrays-and-vectors/sequence.fun: 81
    Leave Vector.V.unfoldi.loop <basis>/arrays-and-vectors/sequence.fun: 36
    raise (x_548)

was transformed to

  L_245 ()
    x_555 () Tail

Rather than condition the transformation on whether profiling is
enabled, which would create more differences between profiled and
non-profiled code, I simply eliminated the transformation.  Bug
continuations are very rare and I suspect there will be no observable
performance impact.
----------------------------------------------------------------------

Obviously, I wasn't quite right with that last bit.  Fortunately,
there's a simple fix.  In the example from the CVS log, we grab the
Leave statements from the handler to change L_245 into

    L_245 ()
      Leave Vector.V.tabulate <basis>/arrays-and-vectors/sequence.fun: 81
      Leave Vector.V.unfoldi.loop <basis>/arrays-and-vectors/sequence.fun: 36
      x_555 () Tail

I checked in a change to ssa/shrink{,2}.fun to do this.  Now the Ssa2
and dot file for the simple example look fine, and ping-pong works
too.

----------------------------------------------------------------------
fun loop_6 (x_312) = loopS_0 ()
  loopS_0 ()
    loop_17 (x_312)
  loop_17 (x_313)
    Enter loop z.sml: 5
    Enter = <basis>/misc/primitive.sml: 13
    x_314 = Word32_equal (x_313, global_2)
    Leave = <basis>/misc/primitive.sml: 13
    case x_314 of
      false => L_240 | true => L_239
  L_239 ()
    Leave loop z.sml: 5
    raise (global_28)
  L_240 ()
    L_241 (x_313 - global_16) Overflow => L_242 ()
  L_241 (x_315)
    Leave loop z.sml: 5
    loop_17 (x_315)
  L_242 ()
    Leave loop z.sml: 5
    raise (global_29)
----------------------------------------------------------------------
digraph "loop_6 control-flow graph" {
label = "loop_6 control-flow graph"; { rank  = "min"; n0 }
n1 [fontcolor = "Black", shape = "box", label = "L_242 ()\lLeave loop z.sml: 5\lraise (global_29)\l"]
n2 [fontcolor = "Black", shape = "box", label = "L_241 (x_315)\lLeave loop z.sml: 5\lloop_17 (x_315)\l"]
n2 -> n3 [label = "\n", style = "solid"]
n4 [fontcolor = "Black", shape = "box", label = "L_239 ()\lLeave loop z.sml: 5\lraise (global_28)\l"]
n5 [fontcolor = "Black", shape = "box", label = "L_240 ()\lx_313 - 0x1\l"]
n5 -> n1 [label = "Overflow\n", style = "dashed"]
n5 -> n2 [label = "\n", style = "solid"]
n3 [fontcolor = "Black", shape = "box", label = "loop_17 (x_313)\lEnter loop z.sml: 5\lEnter = <basis>/misc/primitive.sml: 13\lx_314 = Word32_equal (x_313, global_2)\lLeave = <basis>/misc/primitive.sml: 13\lcase x_314\l"]
n3 -> n4 [label = "true\n", style = "solid"]
n3 -> n5 [label = "false\n", style = "solid"]
n0 [fontcolor = "Black", shape = "box", label = "loopS_0 ()\lloop_17 (x_312)\l"]
n0 -> n3 [label = "\n", style = "solid"]
}
----------------------------------------------------------------------