single threading mlton
Stephen Weeks
MLton@sourcelight.com
Tue, 15 Jan 2002 12:19:37 -0800
> I recalled (although I can't find the email) that Steve noted that
> lib/mlton/basic/engine.sml brought threads in. Turns out that with
> reference to mlton, it's only used in lib/mlton/basic/net.sml which is
> pulled in for fullHostname; but, the computation of fullHostname doesn't
> require threads, so I copied the computation directly into
> src/mlton/control.sml. (Maybe it belongs in
> lib/mlton/basic/proc-env.sml?)
Fine with me. But if I understand your later analysis, the presence
of engine.sml wasn't actually hurting anything, since Engine.run (and
hence Signal.handleWith') was never called.
> I discovered that what's really pulling threads in is
> lib/mlton/basic/vector.fun and the tabulator function by referencing
> lib/mlton/basic/thread.sml and the generate function.
Yuck. I thought I'd gotten rid of that.
> More importantly, that function actually requires thread switching,
> so Thread_switchTo appears in the final program, foiling the multi
> analysis. Vector.tabulator is only used by AppendList.toVector,
> and, while a clever use of threads (it took me a while to figure out
> exactly what was going on),
It may be clever, but it is *slow*.
> it's only really used in some of the elaboration and I suspect that
> no one will really notice the difference with an implementation as
> val toVector = Vector.fromList o toList.
Since this is a conversion from a tree to a vector, it is easier for
the tree walk to own the stack than for the vector creator. That's
why I used a tabulator (which doesn't own the stack) instead of an
unfold (which does). The real problem is the implementation of
Vector.tabulator. One possibility would be to change it to
----------------------------------------------------------------------
fun tabulator (n: int, f: ('a -> unit) -> unit) =
let
val a = Pervasive.Array.array (n, NONE)
val r = ref 0
val _ =
f (fn x =>
let
val i = !r
in
if i >= n
then Error.bug "Vector.tabulator: too many elements"
else (Pervasive.Array.update (a, i, SOME x)
; r := i + 1)
end)
in
if !r < n
then Error.bug "Vector.tabulator: not enough elements"
else tabulate (n, fn i => valOf (Pervasive.Array.sub (a, i)))
end
----------------------------------------------------------------------
I would like to find a way to avoid the array->vector copy, but don't
see it.
Another possibility, which I have heard Henry espouse, is to expose
the tabulator more like a folder, with the following signature
type state
val tabulator: int * (state * ('a * state -> state) -> state) -> 'a t
Then, the implementation is
----------------------------------------------------------------------
local
datatype state = S of int
in
type state = state
fun tabulator (n: int, f: (state * ('a * state -> state) -> state)) =
let
val a = Pervasive.Array.array (n, NONE)
val S n' =
f (S 0,
fn (x, S i) =>
if i >= n
then Error.bug "Vector.tabulator: too many elements"
else (Pervasive.Array.update (a, i, SOME x)
; S (i + 1)))
in
if n' < n
then Error.bug "Vector.tabulator: not enough elements"
else tabulate (n, fn i => valOf (Pervasive.Array.sub (a, i)))
end
end
----------------------------------------------------------------------
And the implementation of AppendList.toVector is
fun toVector ds = Vector.tabulator (length ds, fn (s, f) => fold (ds, s, f))
Of course, we would have liked to have written the type of tabulator
in ML as
val tabulator:
Forall 'a. int * (Forall 'b. 'b * ('a * 'b -> 'b) -> 'b) -> 'a t
but we can't because of prenexity. If you change the type to
val tabulator:
Forall 'a. Forall 'b. int * ('b * ('a * 'b -> 'b) -> 'b) -> 'a t
you get a completely different beast.
Anyways, here's my opinion on the right way to go. I prefer hiding the
state, because it is abstract and is a hack on the true type. Also,
we don't have a linear type system to enforce that it is used
correctly. So, I propose to leave the implementation of
AppendList.toVector as is, and to replace the implementation of
Vector.tabulator with the first one I gave above, unless someone can
think of something even better.
> Anyways, making those changes eliminates all references to
> Thread_switchTo, so the multi analysis should be significantly more
> accurate for constantPropagation and localRef.
...
> The multi analysis only marks block reachable from a
> Thread_copyCurrent as multiThreaded if the program has an instance
> of Thread_switchTo. Since not, we're golden.
Ahh the joys/perils of whole-program optimization. We're clearly
going to need some better thread implementation and analysis if we're
going to ever use threads seriously.
> It seems to have shaved about a minute off of my self compiles.
Quite nice. I can't wait to get the new limit check stuff in to see
if we can break 400s.