new register allocator and calling convention

Thu, 2 Dec 1999 18:41:00 -0800 (PST)

I finally got around to putting in the support for threads, and in
doing so, in the interest of changing as many things as possible at
once, I also changed the register allocator and calling convention.
Here are the changes:

* The stack is now allocated as just another kind of heap object.  If
  you are running without threads, this is clearly a loss over the old 
  system, since the stack is copied on every gc.  But, this really
  simplified the runtime, both for threads, and for other things like
  saveWorld.  It also means that max-heap really does limit the amount 
  of space used by the system (there used to be no limit on the
  stack).

* Limit checks are inserted at all loop headers, whether or not there
  is any allocation.  This is to ensure that the signal handler will
  always have a chance to get called.

* The register allocator puts more variables in stack slots.  The new
  rule is that a variable goes in a stack slot if it is ever live
  across a nontail call, in a handler, or (this is the new part)
  across a limit check.  This means that registers are only used for
  *very* local things.  The reason for this change is that I now allow 
  a thread switch at any limit check point, and I thus need all of the 
  thread state to be captured in the MLton stack.

* Arguments are passed on the stack, with the convention determined by 
  argument types.  This simplified the backend because there didn't
  have to be that song and dance about moving formals from registers
  to stack slots.

* The "locals" array of pointers that was copied to/from for GC is now 
  gone, because no registers (in particular no pointer valued
  registers) can be live at a limit check point.

One might think that this fairly extreme restriction on keeping stuff
in registers hurts, but it apparently doesn't.  Of course it's
impossible to know for sure since I also changed the calling
convention, but here are the numbers for the usual benchmarks.

I'll send out the self-compile numbers when I have them.

		  run time	  code size
		old	new	 old	 new
		----	----	------	------
barnes-hut	15.0	11.5	 36462	 46630
count-graphs	15.1	12.7	 41983	 38671
fft		35.5	35.5	 26577	 27009
knuth-bendix	17.7	17.2	 56770	 55950
lexgen		38.4	35.8	148089	145349
life		64.5	55.2	 33279	 31791
logic		52.0	49.8	166815	143451
mandelbrot	16.9	19.9	 14123	 16215
matrix-multiply	17.7	13.9	 14611	 16783
mlyacc		24.3	21.6	588913	566585
nucleic		18.4	19.3	 45023	 43315
ratio-regions	25.2	25.1	 39251	 44631
simple		20.8	19.4	241235	236379
tsp		25.7	26.0	 27014	 29470
vliw		23.4	21.6	771647	730307
zern		48.8	49.1	 21368	 22788