[MLton-devel] new GC object layout

Stephen Weeks MLton@mlton.org
Mon, 6 May 2002 23:07:10 -0700


Here's my proposed object layout and header word design for the new
GC.  Comments appreciated.

--------------------------------------------------------------------------------
MLton's new GC is supposed to do mark-compact or stop-and-copy
depending on which is more appropriate.  The idea is to do
stop-and-copy if there is plenty of memory and mark-compact if there
is not.  The stop-and-copy will be the same as now - breadth-first
two-space Cheney style.  The mark-compact will use Jonker's
algorithm (for an overview, see page 109 of Jones and Lins book
"Garbage Collection" or Sansom's paper "Dual-mode garbage
collection).  The purpose of this document is to explain the design of
header words necessary to support both of these GCs.  

First, let's review the requirements. For both GCs, the runtime must
be able to 
  1. determine from the header word the object type (array,
     normal, or stack) and the locations of pointers within the object
Both GCs follow pointers to objects, and so must be able to
  2. locate the header given a pointer to the first data word in an object
Both GCs sequentially scan objects, and so must be able to
  3. Locate the header given a pointer to the beginning of the object
In addition, for the depth-first mark-phase of the mark-compact GC,
the runtime must be able to
  4. Mark an object as visited.
  5. Record which ancestor caused this object to be followed
     (i.e. the mark stack)
  6. Record which pointer child is being followed
     (for objects that have multiple children).
For the threading phase of the mark-compact GC, the runtime must build
linked lists of pointers terminated by a header word, so, it must be
able to
  7.  Distinguish pointers from header words

I think that's it.  Here is how I propose to set up header words and
to layout objects.

* Array objects have a counter word, followed by a length word,
  followed by a header word, followed by the array elements.
* Normal objects have a header word, followed by all of the nonpointer
  data, followed by all of the pointers.
* Stack objects have a header word, followed by a GC_stack object.
* A GC_stack, in addition to the words storing amount reserved and
  amount used, has an additional two words to store the current frame
  and offset being marked for (6)

A header word consists of
	a mark bit		for (4)
	11 counter bits		for (6)
	19 index bits		an index into an array for (1)
	a bottom bit of 1	for (7)

An array counter word always has a top bit of zero and is used for (6)
when marking arrays.

An array length word is a twos-complemenet 32 bit Int.int storing the
array length, and hence always has a top bit of zero.

(1) is done via a constant array with one entry for each object layout
in the program, with elements indexed by the 19 index bits in header
words.

(2) is easy, since the header word always immediately precedes the
pointer. 

(3) is a bit tricky, since during the threading phase, header words
may be replaced by pointers.  So it must be possible to distinguish
between a counter word, a header word, and a pointer.  Fortunately,
except during the mark phase, the counter words are always all zero,
which is not a valid header and not a valid pointer.  To distinguish
between a header and pointer, we look at the bottom bit, which is 1
for headers and 0 for pointers.

(4) is done with the mark bit in header words.

(5) is done via pointer reversal.

(6) is done with the 11 counter bits in the header word, plus the
    counter word for arrays.  Or, for stacks, the extra two words in
    the stack header are used.

(7) is easy, because header words have a bottom bit of 1 and pointers
have a bottom bit of 0.

This design limits us to normal objects with less than 2^11 pointer
fields, and to 2^19 different object layouts.  Obviously, those bits
can be traded for each other, but with 11 and 19 I think we've been
far from reaching either either limit with all the programs we've
seen.

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel