[MLton] More on Parallel Runtime
Philip Schatz
schatzp at purdue.edu
Sat Oct 20 12:40:43 PDT 2007
Eric McCorkle wrote:
> I mentioned a few months ago that I was working on a parallel runtime
> system which I intended for use with MLton and possibly other compilers
> for CML-like languages. Here is what I have so far:
>
> I've built an M:N threading system based entirely on lock-free data
> structures. User-level threads (termed just "threads") are mapped onto
> real threads (termed "executors") by the scheduler system, which can be
> configured to use one of two algorithms. Each executor keeps its own
> scheduler, and all schedulers use a lock-free fifo structure for
> work-sharing. I originally intended for the system to use an
> activation-style API like FreeBSD's KSE interface, but it can also use
> pthreads (in fact, I have only implemented the pthread backend at the
> moment).
>
> The runtime and the program communicate by means of a per-thread mailbox
> structure. The actual program is expected to make use of safe-points to
> periodically check its mailbox, possibly calling into the runtime. This
> has two very beneficial effects, I think. The first is that there are
> no involuntary context-switches. The second is that it is possible to
> prevent the scheduler from interrupting a given block (necessary for
> implementing some locks) simply by omitting safe-points.
>
> My plans from here are to implement the Cheng-Blelloch collector (with
> some lock-free improvements). To do this, I need to know exactly how to
> traverse the memory graph. It also may be advantageous to try
> allocating stack frames individually from the heap, as this would reduce
> the overhead of thread creation to a tiny amount, and would reduce the
> collector's pause as well. I'll also need to modify MLton to tie into
> the runtime system at some point (bits like calling into the runtime, as
> well as implementing the CML structures).
>
> I looked briefly, and it seems MLton generates type signatures to allow
> the collector to trace the memory graph. What I need to know is a) is
> this correct? b) where is the format of this specified? Also, what
> would it take to implement the write-barrier that the Cheng-Blelloch
> algorithm requires? Lastly, is there anything else I should consider?
>
Hopefully, I may have some code that might be useful.
I've also been working on multi-core support form MLton but began by
reworking the runtime to support multiple pthreads with plans to add a
M:N threading system. Specifically, I've focused on adding pthread
support to the runtime with as few changes to MLton as possible.
My current solution uses separate heaps for each pthread, the
mark-compact algorithm without card-marking, and a global lock for
collection and atomic operations.
Notes:
- The "executor" model seemed necessary because I found no way in the
FFI to spawn pthreads with different (unit -> unit) functions.
- A global lock for GC_collect and atomic operations
(Thread_atomicBegin/End). In the interest of changing as little of
MLton's runtime as possible, global locks seemed a quick-and-dirty solution.
- C-codegen: to look up per-pthread data with minimal changes to the
compiler (I didn't want to change the x86 codegen until I knew it worked).
- Per-pthread heaps: to not require locks when allocating heap space.
- Split GC_state structure into global data (args, globals, summary
info) and pthread-specific data (heap, frontier, stackTop, etc).
- Thread_atomicBegin/End are now function calls that acquire the global
lock. This was because I needed the frontier/stack information to be
when a GC_collect occurred in another pthread.
I have tested this code with a few programs that spawn a simple
"executor" on multiple pthreads. Currently I'm trying to get CML working
using this model and have run into a little snag getting signal handling
to work (CML uses preemptive threads). Well-defined safe-points would be
an elegant way to overcome this.
I'd be happy to send you what I have. A home in MLton's SVN seems like
the best place to work from.
To answer your questions:
a) Correct. The runtime uses GC_state's frameLayouts and objectTypes to
traverse the heap.
b) They are generated in the mlton/backend/backend.fun (frameLayouts and
frameOffsets) and in mlton/backend/ssa-to-rssa.fun (objectTypes)
--
Philip Schatz
CS Graduate Student
Purdue University
More information about the MLton
mailing list