single threading mlton
Matthew Fluet
Matthew Fluet <fluet@CS.Cornell.EDU>
Tue, 15 Jan 2002 13:47:02 -0500 (EST)
I was looking at the profiling data for the knownCase pass. I don't have
a clear answer on the slowdown, but the majority of the time seems to be
in the SSA restore pass. This seems reasonable along with the fact that
both the main knownCase analysis/transformation and the restore pass need
to traverse (different) dominator trees.
Anyways, looking at some of the auxilary functions, I saw that the
numPeeks and numLinks ref's that are incremented on property list lookups
were being passed around in environments. I understand that they can't be
globalized due to thread,s but I was curious why exactly threads were
being pulled in.
I recalled (although I can't find the email) that Steve noted that
lib/mlton/basic/engine.sml brought threads in. Turns out that with
reference to mlton, it's only used in lib/mlton/basic/net.sml which is
pulled in for fullHostname; but, the computation of fullHostname doesn't
require threads, so I copied the computation directly into
src/mlton/control.sml. (Maybe it belongs in
lib/mlton/basic/proc-env.sml?) Looking at the output of cmcat,
lib/mlton/basic/signals.sml was still being pulled in by
lib/mlton/basic/process.sml. That's used by the Process.call to make the
gcc calls. But, turns out that's a red herring. Trying in vain to
eliminate threads in a simple program that called Process.call, I
discovered that what's really pulling threads in is
lib/mlton/basic/vector.fun and the tabulator function by referencing
lib/mlton/basic/thread.sml and the generate function. More importantly,
that function actually requires thread switching, so Thread_switchTo
appears in the final program, foiling the multi analysis.
Vector.tabulator is only used by AppendList.toVector, and, while a clever
use of threads (it took me a while to figure out exactly what was going
on), it's only really used in some of the elaboration and I suspect that
no one will really notice the difference with an implementation as
val toVector = Vector.fromList o toList.
Anyways, making those changes eliminates all references to
Thread_switchTo, so the multi analysis should be significantly more
accurate for constantPropagation and localRef. There is still on instance
of Thread_copyCurrent which initializes the base in
basis-library/mlton/thread.sml. It's pulled in through MLton.World.save.
Steve rewrote it so that it wouldn't need signal handlers, and supposedly
threads. While it's true that a program without signals won't be able to
set the thread state to InHandler, at the time of the dead pass, nothing
computes that information, so MLton.Thread.switch is pulled in and we're
stuck with the Thread_copyCurrent. But, again, no problem for the multi
analysis, because the first removeUnused pass eliminates the InHandler
variant, which makes all the thread switching code dead, including the
reference to Thread_switchTo. The multi analysis only marks block
reachable from a Thread_copyCurrent as multiThreaded if the program has an
instance of Thread_switchTo. Since not, we're golden.
It seems to have shaved about a minute off of my self compiles.