[MLton] More on Parallel Runtime

Sat Oct 20 10:38:33 PDT 2007

I mentioned a few months ago that I was working on a parallel runtime  
system which I intended for use with MLton and possibly other  
compilers for CML-like languages.  Here is what I have so far:

I've built an M:N threading system based entirely on lock-free data  
structures.  User-level threads (termed just "threads") are mapped  
onto real threads (termed "executors") by the scheduler system, which  
can be configured to use one of two algorithms.  Each executor keeps  
its own scheduler, and all schedulers use a lock-free fifo structure  
for work-sharing.  I originally intended for the system to use an  
activation-style API like FreeBSD's KSE interface, but it can also  
use pthreads (in fact, I have only implemented the pthread backend at  
the moment).

The runtime and the program communicate by means of a per-thread  
mailbox structure.  The actual program is expected to make use of  
safe-points to periodically check its mailbox, possibly calling into  
the runtime.  This has two very beneficial effects, I think.  The  
first is that there are no involuntary context-switches.  The second  
is that it is possible to prevent the scheduler from interrupting a  
given block (necessary for implementing some locks) simply by  
omitting safe-points.

My plans from here are to implement the Cheng-Blelloch collector  
(with some lock-free improvements).  To do this, I need to know  
exactly how to traverse the memory graph.  It also may be  
advantageous to try allocating stack frames individually from the  
heap, as this would reduce the overhead of thread creation to a tiny  
amount, and would reduce the collector's pause as well.  I'll also  
need to modify MLton to tie into the runtime system at some point  
(bits like calling into the runtime, as well as implementing the CML  
structures).

I looked briefly, and it seems MLton generates type signatures to  
allow the collector to trace the memory graph.  What I need to know  
is a) is this correct? b) where is the format of this specified?   
Also, what would it take to implement the write-barrier that the  
Cheng-Blelloch algorithm requires?  Lastly, is there anything else I  
should consider?

-- 
Eric McCorkle
Brown University
CS Graduate Student