[MLton] More on Parallel Runtime

Tue Oct 23 05:59:37 PDT 2007

I've also been working on a "multi-core" MLton, though I have a somewhat
different (and maybe simpler) purpose in mind, so some of the design decisions
I've made are different.  I'm interested in a data-parallel language (as
opposed to a CML-like language), so the semantics is still sequential and
scheduling is cooperative (even at the user level).

I've also been working with pthreads and the C codegen, but I'm still using a
shared heap.  I have a big lock too, but I've been whittling away at it and
using lighter weight synchronization in some places (e.g. allocation).  I'm
also happy to share what I have so far, and I would be interested to compare
our changes to GC_state.

--djs

Philip Schatz wrote:
> My current solution uses separate heaps for each pthread, the
> mark-compact algorithm without card-marking, and a global lock for
> collection and atomic operations.
> 
> Notes:
> - The "executor" model seemed necessary because I found no way in the
> FFI to spawn pthreads with different (unit -> unit) functions.
> 
> - A global lock for GC_collect and atomic operations
> (Thread_atomicBegin/End). In the interest of changing as little of
> MLton's runtime as possible, global locks seemed a quick-and-dirty
> solution.
> 
> - C-codegen: to look up per-pthread data with minimal changes to the
> compiler (I didn't want to change the x86 codegen until I knew it worked).
> 
> - Per-pthread heaps: to not require locks when allocating heap space.
> 
> - Split GC_state structure into global data (args, globals, summary
> info) and pthread-specific data (heap, frontier, stackTop, etc).
> 
> - Thread_atomicBegin/End are now function calls that acquire the global
> lock. This was because I needed the frontier/stack information to be
> when a GC_collect occurred in another pthread.
> 
> 
> I have tested this code with a few programs that spawn a simple
> "executor" on multiple pthreads. Currently I'm trying to get CML working
> using this model and have run into a little snag getting signal handling
> to work (CML uses preemptive threads). Well-defined safe-points would be
> an elegant way to overcome this.
> 
> I'd be happy to send you what I have. A home in MLton's SVN seems like
> the best place to work from.
> 
> To answer your questions:
> a) Correct. The runtime uses GC_state's frameLayouts and objectTypes to
> traverse the heap.
> b) They are generated in the mlton/backend/backend.fun (frameLayouts and
> frameOffsets) and in mlton/backend/ssa-to-rssa.fun (objectTypes)
> 
>