[MLton] Ongoing parallel runtime work
Eric McCorkle
ericmcc at cs.brown.edu
Sat Jan 19 18:57:22 PST 2008
Hello everyone,
I've been working steadily on my efforts to implement the parallel
runtime system I described in earlier discussions. The scheduler is
at a point where I'm comfortable adding it to subversion. I'll have
to come back and do some work to implement the CML synchronization
primitives, but the scheduler itself, as far as I can tell, is solid
(I have a randomized test program I can send with it). I already
have subversion write access, but need a bit of direction on what
subdirectories, etc my code should go under.
I also have an (as yet untested, but by-the-paper) implementation of
Maged Michael's lock-free malloc. This is primarily for the
scheduler's sake, but will benefit whatever bits of C code exist as
well. It compiles, but this is C, so that means nothing.
Lastly, I'm currently working on a parallel, mostly lock-free garbage
collector (it requires only a barrier at the beginning and end of
collection right now, and I don't think this is going away). It's
unavoidable that I'll have to have a separate compilation mode to use
the parallel collector, as I've had to alter the memory
representations of objects, and different code generation will be
necessary in some cases.
In particular, I've done the following:
- The object header now occupies a cache line, primarily because the
collector has to perform a compare-and-set operation on a pointer to
the object in the destination heap to "claim" it.
- The object header has some space for other information, like an
array's length, as well as a pointer which is used to build queues.
- Arrays are (sometimes) preceded by a bitmap which is used to claim
their elements (using compare-and-sets), allowing arrays to be
scanned and copied by multiple collector threads.
- Global pointers have to be double-pointers, one of which is used by
the collector, one by the program's threads (Cheng-Blelloch does
this). The roles are switched at the end of every collection (see my
comments on generations). There is a fairly clever trick to avoid
having to check a global variable for every pointer access.
- Weak pointers got absorbed into the objects themselves. They come
after the regular pointers.
- Finalizers attach a cache line at the end of the object, and spawn
a thread. The thread wakes up only when the object is no longer
reachable.
- Stack objects are gone (I'll allocate frames directly), and threads
are dealt with by the scheduler, leaving only normal objects and
arrays (for now).
In the beginning, I was planning not to do generational collection at
this time, but I've built things to make it easy to add. I'm
reconsidering this, as supporting generational collection would make
some aspects of calculating the root set alot easier (the global
pointers and the currently-existing threads can be treated as
externally-stored objects and described using type signatures).
However, the price is that *all* pointers must be represented with
double-pointers, not just global pointers. Does anyone have
practical experience/observations which can inform this decision?
--
Eric McCorkle
Brown University
CS Graduate Student
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mlton.org/pipermail/mlton/attachments/20080119/55708665/attachment.htm
More information about the MLton
mailing list