[MLton-commit] r4026

Mon, 22 Aug 2005 16:05:08 -0700

Created runtime/gc directory to break up monolithic gc.{h,c}
implementation.  Even if for performance reasons we need to cat
smaller implementation files into one monolithic file, separate files
still make sense for tracking revisions and for organization.

Added runtime/gc/mltongc.txt as an overview of the MLton garbage
collector.

----------------------------------------------------------------------

A   mlton/branches/on-20050822-x86_64-branch/runtime/gc/
A   mlton/branches/on-20050822-x86_64-branch/runtime/gc/mltongc.txt

----------------------------------------------------------------------

Added: mlton/branches/on-20050822-x86_64-branch/runtime/gc/mltongc.txt
===================================================================

--- mlton/branches/on-20050822-x86_64-branch/runtime/gc/mltongc.txt	2005-08-22 22:48:34 UTC (rev 4025)
+++ mlton/branches/on-20050822-x86_64-branch/runtime/gc/mltongc.txt	2005-08-22 23:05:06 UTC (rev 4026)
@@ -0,0 +1,319 @@
+
+Notes on the MLton garbage collection system.  Until the "Thoughts on
+64-bits" section, a word is considered to be 32-bits.
+
+Garbage Collector
+=================
+
+MLton implements a relatively simple garbage collection strategy, that
+nonetheless adapts itself readily to different scenarios of memory usage.
+
+All ML objects (including ML execution stacks) are allocated in a
+contiguous heap.  The heap has the following general layout:
+
+  ---------------------------------------------------
+ |    old generation    |   to space   |   nursery   |
+  ---------------------------------------------------
+ ^                       ^                ^          ^
+ start                   back             frontier   limit
+
+New ML objects are allocated in the nursery at the frontier.  Upon
+exhausting the nursery (i.e., when limit - frontier is insufficient
+for the next object allocation), a garbage collection is initiated.  
+
+It should be noted that in the absence of memory pressure, the
+to-space is of zero size and the old-generation is simply the live
+data from the last garbage collection.  Hence, generational garbage
+collection is only enabled when the program display sufficiently high
+memory usage.
+
+In the common, non-generational scenario, a garbage collection
+involves one of two major garbage collection strategies.  If there is
+sufficient memory to allocate a second heap of approximately the same
+size as the current heap, then a Cheney Copy garbage collection is
+performed.  (In practice, the second heap is already allocated and the
+two semi-spaces are swapped at each Cheney Copy.)  If there is
+insufficient memory for a second semi-space, then a Mark Compact
+garbage collection is performed.
+
+After a Mark Compact garbage collection, or if the live ratio is low
+enough, the runtime switches to a generational collection.  In this
+scenario, the current live data becomes the old-generation, while the
+remaining space is split into the to-space and the nursery.  A minor
+garbage collection copies live objects from the nursery to the
+beginning of to-space, thereby extending the old-generation and
+shrinking the space available for the to-space and the nursery.
+Eventually, the nursery becomes too small to accomodate new object
+allocations, and a major garbage collection is intiated.
+
+The MLton garbage collector additionally supports weak pointers and
+object finalizers, hash-consing (sharing) of both the entire heap and
+the heap reachable from individual objects, computing the dynamic size
+of objects, and provides some runtime support for profiling.
+
+In the sequel we will refer to pointers to objects in the ML heap as
+"heap pointers".  Note that a valid heap pointer is always bounded by
+the start pointer and the limit pointer of the current heap.  Hence,
+heap pointers admit representations other than the native pointer
+representation.  Furthermore, precise garbage collection requires
+identifying all heap pointers in ML objects.
+
+There are four kinds of ML objects: array, normal (fixed size), stack,
+and weak.  Each object has a header (currently, a 32-bit word), which
+immediately precedes the object data.  A heap pointer always denotes
+the address following the header (i.e., the first data word); there
+are no heap pointers to object interiors.
+
+
+A header word has the following bit layout:
+
+  00        : 1
+  01 - 19   : type index bits
+  20 - 30   : counter bits, used by mark compact GC
+       31   : mark bit, used by mark compact GC
+
+Normal objects have the following layout:
+
+  header word :: 
+  (non heap-pointers)* :: 
+  (heap pointers)*
+
+Note that the non heap-pointers denote a sequence of primitive data
+values.  These data values need not map directly to values of the
+native word size.  MLton's aggressive representation strategies may
+pack multiple primitive values into the same native word.  Likewise, a
+primitive value may span multiple native words (e.g., Word64.word).
+
+Array objects have the following layout:
+
+  counter word :: 
+  length word :: 
+  header word :: 
+  ( (non heap-pointers)* :: (heap pointers)* )*
+
+The counter word is used by mark compact GC.  The length word is the
+number of elements in the array.  Array elements have the same
+individual layout as normal objects, omitting the header word.
+
+Stack objects have the following layout:
+
+  header word ::
+  markTop pointer ::
+  markIndex word ::
+  reserved word ::
+  used word ::
+  ... reserved bytes ...
+
+The markTop pointer and markIndex word are used by mark compact GC.
+The reserved word gives the number of bytes for the stack (before the
+next ML object).  The used word gives the number of bytes currently
+used by the stack.  The sequence of reserved bytes correspond to ML
+stack frames, which will be discussed in more detail below.
+
+Weak objects have the following layout:
+
+  header word ::
+  unused word ::
+  link word ::
+  heap-pointer
+  
+
+The type index of a header word is an index into an array, where each
+element describes the layout of an object.  The 19 bits available for
+the type index means that there are only 2^19 different object layouts
+per program.  The "hello-world" program yields 37 object types in the
+array, though there are only 19 distinct object types.
+
+The type index array is declared as follows:
+
+        typedef enum { 
+                ARRAY_TAG,
+                NORMAL_TAG,
+                STACK_TAG,
+                WEAK_TAG,
+        } GC_ObjectTypeTag;
+
+        typedef struct {
+                GC_ObjectTypeTag tag;
+                Bool hasIdentity;
+                ushort numNonPointers;
+                ushort numPointers;
+        } GC_ObjectType;
+
+        GC_ObjectType *objectTypes; /* Array of object types. */
+
+The objectTypes pointer is initialized to point to a static array of
+object types that is emitted for each compiled program.  The
+hasIdentity field indicates whether or not the object has mutable
+fields, in which case it may not be hash-cons-ed.  In a normal object,
+the numNonPointers field indicates the number of 32-bit words of non
+heap-pointer data, while the numPointers field indicates the number of
+heap pointers.  In an array object, the numNonPointers field indicates
+the number of bytes of non heap-pointer data, while the numPointers
+field indicates the number of heap pointers.  In a stack object, the
+numNonPointers and numPointers fields are irrelevant.  In a weak
+object, the numNonPointers and numPointers fields are interpreted as
+in a normal object.
+
+As an example, here is a portion of the static data emitted for the
+"hello-world" program:
+
+static GC_ObjectType objectTypes[] = {
+        { 2, FALSE, 0, 0 },
+        { 0, FALSE, 1, 0 },
+        { 1, TRUE, 2, 1 },
+        { 3, FALSE, 3, 0 },
+        { 0, FALSE, 4, 0 },
+        ...
+}
+
+
+The "... reserved bytes ..." of a stack object constitute a linear
+sequence of frames.  For the purposes of garbage collection, we must
+be able to recover the size and offsets of live heap-pointers for each
+frame.  This data is declared as follows:
+
+        typedef ushort *GC_offsets;
+
+        typedef struct GC_frameLayout {
+                char isC;
+                ushort numBytes;
+                GC_offsets offsets;
+        } GC_frameLayout;
+
+        GC_frameLayout *frameLayouts;
+
+The frameLayouts pointer is initialized to point to a static array of
+frame layouts that is emitted for each compiled program.  The isC
+field identified whether or not the frame is for a C call. (Note: The
+ML stack is distinct from the system stack.  A C call executes on the
+system stack.  The frame left on the ML stack is just a marker.)  The
+numBytes field indicates the size of the frame, including space for
+the return address.  The offsets field points to an array (the zeroeth
+element recording the size of the array) whose elements record byte
+offsets from the bottom of the frame at which live heap pointers are
+located.
+
+As an example, here is a portion of the static data emitted for the
+"hello-world" program:
+
+static ushort frameOffsets0[] = {0};
+static ushort frameOffsets1[] = {2,0,4};
+static ushort frameOffsets2[] = {1,0};
+static ushort frameOffsets3[] = {2,4,16};
+static ushort frameOffsets4[] = {1,4};
+...
+static GC_frameLayout frameLayouts[] = {
+        {TRUE, 4, frameOffsets0},
+        {FALSE, 4, frameOffsets0},
+        {TRUE, 20, frameOffsets1},
+        {TRUE, 20, frameOffsets2},
+        {FALSE, 12, frameOffsets0},
+        ...
+
+
+
+Thoughts on 64-bits:
+
+ * At this high level, I don't see obvious difficulties with adapting
+   the garbage collector to a 64-bit platform.  However, there are
+   certainly a number of design decisions.
+
+ * What representation for heap pointers?
+   
+   There is a preliminary proposal from Stephen:
+     http://mlton.org/pipermail/mlton/2004-October/026162.html
+
+   Certainly, it would appear to be easiest to begin with a scenario
+   where heap pointers share the same representation as native
+   pointers (i.e., 64-bits).  However this means that ML objects will
+   be quite a bit bigger in the 64-bit world.  Ultimately, it would be
+   appropriate to have multiple strategies at hand.
+
+   Assuming that per-compile representation strategies are available,
+   the question arises as to how to best integrate with the runtime
+   system.  The compiler proper can handle internalizing/externalizing
+   heap pointers in the code it emits.  However, it seems likely that
+   we would want multiple libmlton.a libraries available,
+   corresponding to the different strategies.  The overhead of
+   consulting a flag in the runtime state to determine the
+   representation of heap pointers at every heap pointer dereference
+   would appear to much much too high.  The implementation may
+   certainly make use of inline functions or macros to unify the
+   different strategies, but it seems as though we will want to
+   compile different specializations of the runtime system.
+
+   Also, I think it makes sense to ensure that heap pointers passed
+   through the FFI are externalized -- that is, C code will only ever
+   see 64-bit pointers, regardless of the representation strategy.
+
+   However, there is an argument against this.  Currently, int ref ref
+   is a valid FFI type, and we currently claim that it has the
+   "natural C representation."  This claim would be broken if the
+   inner ref had a different heap pointer representation.
+
+   We could provide {extern,intern}HeapPointer functions for C, but
+   then it is not clear how to compile the C code, not knowing what
+   representation will be chosen for heap pointers.
+
+ * How big should arrays be?
+
+   We currently allow arrays of size up to Int.maxInt, where Int.int
+   is a 32-bit integer.  It is a separate issue to decide how the
+   Basis Library should change in the presence of a 64-bit port, but
+   if we were to allow arrays of size up to Int64.maxInt, then the
+   representation of array objects would need to change, as the
+   counter word and the length word would need to be larger to
+   accomodate very large arrays.
+
+ * Another big design decision concerns how best to accomodate both
+   the 32-bit garbage collector and the 64-bit garbage collection with
+   (much) the same code.  Sharing as much code as possible would be
+   desirable, as we do not wish the two systems to vary in any
+   significant way.
+
+   I think that this strongly suggests that all sizes and offsets are
+   measured in (8-bit) bytes.  I can't remember why array and normal
+   objects treat the numNonPointers field of a GC_ObjectType
+   differently.
+
+   I think that it also strongly suggests that we avoid the C types
+   int and long, and instead use more specific C99 types.
+
+   I also think that it is a fairly safe assumption to assume that the
+   programs compiled on 64-bit architectures are essentially the same
+   as those compiled on 32-bit architectures.  In particular, 2^19
+   object types should remain viable for some time to come.  Likewise,
+   the 20 counter bits in the header word (used to implement the mark
+   stack) should continue to be sufficient for the number of heap
+   pointers in a normal heap object.  Finally, 16-bits for the
+   numNonPointers and numPointers fields of a GC_ObjectType will
+   continue to suffice.  (For a truly absurd example, the currently
+   active exception handler is represented by a 32-bit offset from the
+   bottom of the stack.  If an ML execution stack were to grow to more
+   than 4GB, this representation would no longer suffice.)
+
+   On the other hand, it is not safe to assume that the parameters of
+   a 64-bit host system are essentially the same as a 32-bit host
+   system.  For example, in order to make decisions regarding garbage
+   collection strategies, the runtime must query the amount of
+   available RAM.  Likewise, garbage collection statistics, such as
+   bytesAllocated, bytesCopied, bytesLive, etc., could potentially be
+   an order of magnitude larger on 64-bit systems.  And, most
+   importantly, the actual size of the heap could be much larger on a
+   64-bit system.
+
+ * Finally, I note that gc.c weighs in at 4826 lines, which is
+   significantly larger than almost any SML file in the compiler.
+   (The exceptions are the x86 native codegen register allocator and
+   the elaborator for the core language.)  Since we'll be going over
+   the garbage collector with a fine tooth comb anyway, it might be
+   time to start breaking it into separate implementation files.
+
+Those are some intial thoughts, and may provide a starting point for
+some discussion.
+
+_______________________________________________
+MLton mailing list
+MLton@mlton.org
+http://mlton.org/mailman/listinfo/mlton