new register allocator and calling convention
Stephen Weeks
sweeks@wasabi.epr.com
Thu, 2 Dec 1999 18:41:00 -0800 (PST)
I finally got around to putting in the support for threads, and in
doing so, in the interest of changing as many things as possible at
once, I also changed the register allocator and calling convention.
Here are the changes:
* The stack is now allocated as just another kind of heap object. If
you are running without threads, this is clearly a loss over the old
system, since the stack is copied on every gc. But, this really
simplified the runtime, both for threads, and for other things like
saveWorld. It also means that max-heap really does limit the amount
of space used by the system (there used to be no limit on the
stack).
* Limit checks are inserted at all loop headers, whether or not there
is any allocation. This is to ensure that the signal handler will
always have a chance to get called.
* The register allocator puts more variables in stack slots. The new
rule is that a variable goes in a stack slot if it is ever live
across a nontail call, in a handler, or (this is the new part)
across a limit check. This means that registers are only used for
*very* local things. The reason for this change is that I now allow
a thread switch at any limit check point, and I thus need all of the
thread state to be captured in the MLton stack.
* Arguments are passed on the stack, with the convention determined by
argument types. This simplified the backend because there didn't
have to be that song and dance about moving formals from registers
to stack slots.
* The "locals" array of pointers that was copied to/from for GC is now
gone, because no registers (in particular no pointer valued
registers) can be live at a limit check point.
One might think that this fairly extreme restriction on keeping stuff
in registers hurts, but it apparently doesn't. Of course it's
impossible to know for sure since I also changed the calling
convention, but here are the numbers for the usual benchmarks.
I'll send out the self-compile numbers when I have them.
run time code size
old new old new
---- ---- ------ ------
barnes-hut 15.0 11.5 36462 46630
count-graphs 15.1 12.7 41983 38671
fft 35.5 35.5 26577 27009
knuth-bendix 17.7 17.2 56770 55950
lexgen 38.4 35.8 148089 145349
life 64.5 55.2 33279 31791
logic 52.0 49.8 166815 143451
mandelbrot 16.9 19.9 14123 16215
matrix-multiply 17.7 13.9 14611 16783
mlyacc 24.3 21.6 588913 566585
nucleic 18.4 19.3 45023 43315
ratio-regions 25.2 25.1 39251 44631
simple 20.8 19.4 241235 236379
tsp 25.7 26.0 27014 29470
vliw 23.4 21.6 771647 730307
zern 48.8 49.1 21368 22788