[MLton] gc question

Thu Jan 18 23:22:11 PST 2007

> I am looking into implementing the Cheng-Blelloch garbage collector for
> MLton (as part of my grad school work). 

Great.

> I am confused about how register allocation and spilling in a codegen
> work with the stack frame layouts used by the gc, and with stack limit
> checking. The frame layouts don't include space for spilled registers,
> but this is OK since nothing in a register is live at gc calls, so the
> spilled registers don't need to be traced--is that right?

Correct.

 > How do the
> stack limit checks account for the stack space used for spilled
> registers?

There is no need to, since spilled registers don't get spilled to the 
(ML) stack.  The input to the codegens is a Machine IL program.  Such a 
program has effectively two forms of function local operands: stack 
slots and (pseudo-)registers.  The Machine IL register allocator 
partitions all of the temporaries in a Machine IL function into these 
two sorts of operands; as you note above, stack slots correspond to 
temporaries that are live over a gc point, while pseudo-regs correspond 
to temporaries that are never live over a gc point.

The C codegen sees the stack operands (on the ML stack) as memory 
locations, so read/write/move with these operands is a memory 
read/write/move.  The C codegen sees the pseudo-reg operands as local 
variables of the C function which implements a particular Machine IL 
function.  It is certainly the case that we may have more pseudo-regs 
live at a program point than there are physical registers.  The C 
compiler simply spills to the C stack, so the space for the spilled 
registers is added to the size of the C function frame.  Since we use a 
trampoline mechanism for switching between C functions, there is only 
ever one C function on the C stack at a time and we can effectively 
handle arbitrary amount of spilling to the C stack.

The native x86 codegen could use the same trick: spill to the top of the 
C stack and recover the spill space at every inter-Machine IL function 
transfer.  We're guaranteed that no spilled pseudo-reg is live at such a 
transfer.  However, the native codegen uses a slightly different trick, 
which is to note that a single threaded execution means that only one 
Machine IL function is ever executed at a time, so all Machine IL 
functions can share the same global spill space.  The size of this spill 
space is bounded by the number of pseudo-regs in any Machine IL function 
(plus a few extra slots to handle primitives that need a couple of extra 
temporaries that may need to be spilled).  We'll need to go to spilling 
to the top of the C stack when we support multiple ML threads executing 
at the same time.

> The Cheng-Blelloch collector is incremental and uses a write barrier on
> heap writes. I'm trying to see in what MLton pass it would be best to
> add the write barrier; it seems like the Machine IL has all the writes
> explicitly, and perhaps the heap writes could be determined from the
> operands here (or perhaps more information is needed from earlier
> passes). Any thoughts on this?

The Machine IL or the RSSA IL is definitely the place to insert the 
write barrier.  There is a form of write barrier for the card marking 
when the generational collector is being used; this is inserted in the 
SSA-to-RSSA pass.  As with card marking, you may benefit from MLton's 
accurate type information which ensures that you know (at compile time) 
whether or not a word sized data element is a pointer or not.