performance page

Tue, 9 Oct 2001 14:28:42 -0400 (EDT)

> So it looks like the new ML-kit run-time performance is still pretty bad
> (although their simple non-tail function-call speed is faster than ours
> (seen from fib and tak)).

This is something i've been thinking about, since I've got a more detailed
model of transfers right now.  I was thinking about doing "small" function
calls and returns with values in registers.  I was really motivated by
looking at some programs that use Int.toString, where we do a bunch of
calls to div_0 and mod_0 in a row.  Both of those functions are tiny, but
used all over the place, so they aren't inlined.  At the same time, they
take two values and return one value (and are leaf functions) so it would
be really nice to just pass their args and return values in registers,
rather than banging them in and out of memory on the stack.

To an extent, this might help fib and tak, but there will still be some
memory traffice, because they aren't leaf functions. 

Anyways, because the x86-codegen only looks at a CPS/SSA function at a
time, we need a uniform calling convention, but I don't think that is that
hard.  I was thinking that returns of 3 or less int-sized values could be
returned in registers, and likewise for calls with 3 or less int-sized
values.  This is a decision that can be made at each tail/nontail call
point (and function entry) and at each return (and cont block), so we
don't need inter-function information.