[MLton] More on Parallel Runtime
Eric McCorkle
ericmcc at cs.brown.edu
Sat Oct 20 10:38:33 PDT 2007
I mentioned a few months ago that I was working on a parallel runtime
system which I intended for use with MLton and possibly other
compilers for CML-like languages. Here is what I have so far:
I've built an M:N threading system based entirely on lock-free data
structures. User-level threads (termed just "threads") are mapped
onto real threads (termed "executors") by the scheduler system, which
can be configured to use one of two algorithms. Each executor keeps
its own scheduler, and all schedulers use a lock-free fifo structure
for work-sharing. I originally intended for the system to use an
activation-style API like FreeBSD's KSE interface, but it can also
use pthreads (in fact, I have only implemented the pthread backend at
the moment).
The runtime and the program communicate by means of a per-thread
mailbox structure. The actual program is expected to make use of
safe-points to periodically check its mailbox, possibly calling into
the runtime. This has two very beneficial effects, I think. The
first is that there are no involuntary context-switches. The second
is that it is possible to prevent the scheduler from interrupting a
given block (necessary for implementing some locks) simply by
omitting safe-points.
My plans from here are to implement the Cheng-Blelloch collector
(with some lock-free improvements). To do this, I need to know
exactly how to traverse the memory graph. It also may be
advantageous to try allocating stack frames individually from the
heap, as this would reduce the overhead of thread creation to a tiny
amount, and would reduce the collector's pause as well. I'll also
need to modify MLton to tie into the runtime system at some point
(bits like calling into the runtime, as well as implementing the CML
structures).
I looked briefly, and it seems MLton generates type signatures to
allow the collector to trace the memory graph. What I need to know
is a) is this correct? b) where is the format of this specified?
Also, what would it take to implement the write-barrier that the
Cheng-Blelloch algorithm requires? Lastly, is there anything else I
should consider?
--
Eric McCorkle
Brown University
CS Graduate Student
More information about the MLton
mailing list