Type-Preserving Garbage Collectors
Stephen Weeks
MLton@sourcelight.com
Thu, 25 Jan 2001 13:23:15 -0800 (PST)
Hi Dan. Henry mentioned your POPL talk to me, so I read through your paper --
it was very interesting. Your paper mentions possible problems with code bloat
due to specialized gc code per type. Also, Henry mentioned to me the possible
problem of icache pressure in the gc code with many different object types.
Just for fun, I instrumented MLton's runtime to count the number of different
object headers seen. I was curious to know for a large program like a self
compile how many there are.
Here are the measurements for the CPS syntax tree that is produced after all
optimization during a self compile. The header column gives the object header
and the count column gives the number of objects with that header.
header count
-------- ------
00000001 302302
00020000 3
00008000 13035
80000001 859615
80000002 943930
80000003 285951
80000004 1437
80000005 5026
80000006 820
80000007 2503
80000008 1
8000000a 1002
80008000 90164
80008001 312513
80008002 530275
80008003 555661
80008004 45095
80008005 269
8000800a 387
80010000 1717
80010001 67396
80010002 4134
80018000 1
80038000 1
The upshot is that there aren't that many distinct headers. There may be more
object types, because the object type has to recursively describe the object
types that its pointers point to. But my guess is that isn't too bad either,
especially if you make the only the distinctions that are necessary for the gc,
i.e. between pointers and nonpointers.
Anyways, I was curious if you had made any progress in porting your stuff to a
later version of MLton and making it work on larger programs.
Also, I have one minor correction for the paper. MLton uses a standard
*breadth* first Cheney copying collector, not *depth* first as it says in the
paper.