[MLton] profiling and tail calls
Stephen Weeks
MLton@mlton.org
Fri, 19 Nov 2004 09:57:56 -0800
> I note that lib/cml/tests/ping-pong.mlb runs in constant stack space
> without profiling, but runs with >= O(n) stack space with profiling
> (of any sort).
The problem is that MLton proves that the pong loop never terminates.
This causes the normal return case of the call to be a Bug transfer
instead of Leave followed by a Return transfer. Hence, the recursive
should-be-tail call to loop doesn't match the pattern of the
optimization that I showed earlier, where the SSA shrinker looks for a
nontail call where both the normal and exceptional continuation are
only profiling statements followed by a Return or Raise respectively.
Here's a simple program that demonstrates the problem.
----------------------------------------------------------------------
val n = valOf (Int.fromString (hd (CommandLine.arguments ())))
exception Done
fun loop n =
if n = 0
then raise Done
else loop (n - 1)
val () = loop n
----------------------------------------------------------------------
Compiling this with
mlton -profile time -keep ssa2 -keep dot z.sml
yields a program that takes O(n) space. Here's the offending Ssa2 and
corresponding dot file. The Bug transfer in L_370 causes the problem.
----------------------------------------------------------------------
fun loop_6 (x_312) = L_238 ()
L_238 ()
Enter loop z.sml: 5
Enter = <basis>/misc/primitive.sml: 13
x_313 = Word32_equal (x_312, global_2)
Leave = <basis>/misc/primitive.sml: 13
case x_313 of
false => L_240 | true => L_239
L_239 ()
Leave loop z.sml: 5
raise (global_28)
L_240 ()
L_241 (x_312 - global_16) Overflow => L_242 ()
L_241 (x_314)
loop_6 (x_314) NonTail {cont = L_370, handler = Handle L_244}
L_244 (x_315)
Leave loop z.sml: 5
raise (x_315)
L_370 ()
Bug
L_242 ()
Leave loop z.sml: 5
raise (global_29)
----------------------------------------------------------------------
digraph "loop_6 control-flow graph" {
label = "loop_6 control-flow graph"; { rank = "min"; n0 }
n1 [fontcolor = "Black", shape = "box", label = "L_244 (x_315)\lLeave loop z.sml: 5\lraise (x_315)\l"]
n2 [fontcolor = "Black", shape = "box", label = "L_370 ()\lbug\l"]
n3 [fontcolor = "Black", shape = "box", label = "L_242 ()\lLeave loop z.sml: 5\lraise (global_29)\l"]
n4 [fontcolor = "Black", shape = "box", label = "L_241 (x_314)\lloop_6 (x_314)\l"]
n4 -> n1 [label = "\n", style = "dashed"]
n4 -> n2 [label = "\n", style = "dotted"]
n5 [fontcolor = "Black", shape = "box", label = "L_239 ()\lLeave loop z.sml: 5\lraise (global_28)\l"]
n6 [fontcolor = "Black", shape = "box", label = "L_240 ()\lx_312 - 0x1\l"]
n6 -> n3 [label = "Overflow\n", style = "dashed"]
n6 -> n4 [label = "\n", style = "solid"]
n0 [fontcolor = "Black", shape = "box", label = "L_238 ()\lEnter loop z.sml: 5\lEnter = <basis>/misc/primitive.sml: 13\lx_313 = Word32_equal (x_312, global_2)\lLeave = <basis>/misc/primitive.sml: 13\lcase x_313\l"]
n0 -> n5 [label = "true\n", style = "solid"]
n0 -> n6 [label = "false\n", style = "solid"]
}
------------------------------------------------------------
There's a relevant note in the CVS log for revision 1.35 of
ssa/shrink.fun that shows that the optimization used to be done to
turn such a nontail call into a tail call, but that that caused
another problem. Here's the log entry.
----------------------------------------------------------------------
Fixed a bug in the shrinker that only showed up when profiling.
Thanks to Joe Hurd for the bug report. The problem is that any
profiling stack is acceptable at a Bug transfer, but the stack must be
empty at a Tail call. However, the shrinker changed a nontail call
with a Bug continuation into a tail call, which caused checkProf to
flag a bug. Here was the offending SSA.
L_245 ()
x_555 () NonTail {cont = L_6312, handler = Handle L_242}
L_6312 ()
Bug
L_242 (x_548)
Leave Vector.V.tabulate <basis>/arrays-and-vectors/sequence.fun: 81
Leave Vector.V.unfoldi.loop <basis>/arrays-and-vectors/sequence.fun: 36
raise (x_548)
was transformed to
L_245 ()
x_555 () Tail
Rather than condition the transformation on whether profiling is
enabled, which would create more differences between profiled and
non-profiled code, I simply eliminated the transformation. Bug
continuations are very rare and I suspect there will be no observable
performance impact.
----------------------------------------------------------------------
Obviously, I wasn't quite right with that last bit. Fortunately,
there's a simple fix. In the example from the CVS log, we grab the
Leave statements from the handler to change L_245 into
L_245 ()
Leave Vector.V.tabulate <basis>/arrays-and-vectors/sequence.fun: 81
Leave Vector.V.unfoldi.loop <basis>/arrays-and-vectors/sequence.fun: 36
x_555 () Tail
I checked in a change to ssa/shrink{,2}.fun to do this. Now the Ssa2
and dot file for the simple example look fine, and ping-pong works
too.
----------------------------------------------------------------------
fun loop_6 (x_312) = loopS_0 ()
loopS_0 ()
loop_17 (x_312)
loop_17 (x_313)
Enter loop z.sml: 5
Enter = <basis>/misc/primitive.sml: 13
x_314 = Word32_equal (x_313, global_2)
Leave = <basis>/misc/primitive.sml: 13
case x_314 of
false => L_240 | true => L_239
L_239 ()
Leave loop z.sml: 5
raise (global_28)
L_240 ()
L_241 (x_313 - global_16) Overflow => L_242 ()
L_241 (x_315)
Leave loop z.sml: 5
loop_17 (x_315)
L_242 ()
Leave loop z.sml: 5
raise (global_29)
----------------------------------------------------------------------
digraph "loop_6 control-flow graph" {
label = "loop_6 control-flow graph"; { rank = "min"; n0 }
n1 [fontcolor = "Black", shape = "box", label = "L_242 ()\lLeave loop z.sml: 5\lraise (global_29)\l"]
n2 [fontcolor = "Black", shape = "box", label = "L_241 (x_315)\lLeave loop z.sml: 5\lloop_17 (x_315)\l"]
n2 -> n3 [label = "\n", style = "solid"]
n4 [fontcolor = "Black", shape = "box", label = "L_239 ()\lLeave loop z.sml: 5\lraise (global_28)\l"]
n5 [fontcolor = "Black", shape = "box", label = "L_240 ()\lx_313 - 0x1\l"]
n5 -> n1 [label = "Overflow\n", style = "dashed"]
n5 -> n2 [label = "\n", style = "solid"]
n3 [fontcolor = "Black", shape = "box", label = "loop_17 (x_313)\lEnter loop z.sml: 5\lEnter = <basis>/misc/primitive.sml: 13\lx_314 = Word32_equal (x_313, global_2)\lLeave = <basis>/misc/primitive.sml: 13\lcase x_314\l"]
n3 -> n4 [label = "true\n", style = "solid"]
n3 -> n5 [label = "false\n", style = "solid"]
n0 [fontcolor = "Black", shape = "box", label = "loopS_0 ()\lloop_17 (x_312)\l"]
n0 -> n3 [label = "\n", style = "solid"]
}
----------------------------------------------------------------------