performance page
Matthew Fluet
Matthew Fluet <fluet@CS.Cornell.EDU>
Tue, 9 Oct 2001 18:31:31 -0400 (EDT)
> I just did a quick comparison of fib in C (which always passes all arguments
> on the stack, but returns results in a register) with MLton, and MLton is
> still 20% slower than C (and this with overflow checking turned off). In
> this case the extra overhead is two extra compares of registers (%ebp and
> %esp) to the values in memory locations. One of these is for the heap.
> Note, there is no allocation in the loop. Is the other some interrupt check?
O.k. The comparisons I see are:
fib_0:
statementLimitCheckLoop_1:
cmpl ((gcState+48)+(0*4)),%ebp
jae doGC_5
checkFrontier_1:
cmpl ((gcState+8)+(0*4)),%esp
jnbe doGC_4
skipGC_1:
The first one is the stack limit check. The second one is the limit
check. We need the stack limit check; we (may) need the limit check for
interrupt/thread handling -- it's for whatever reason we decided that the
LimitCheck macro in ccodegen.h needs to put the invocation of GC_gc inside
a
do {
// invoke GC_gc
} while (frontier + (b) > gcState.limit)
I think the issue is the following:
...
thread A checks for 100 bytes; fails and invokes GC_gc
GC_gc invoked; fiddles with heap; gets 200 free bytes; switches to thread B
thread B consumes 175 bytes;
thread B checks for 50 bytes; fails and invokes GC_gc
GC_gc invoked; fiddles with heap; gets 75 free bytes; switches to thread A
thread A continues, thinking it has 100 bytes, when there are really only 75
This example requires threads. There is something in insert-limit checks
that avoids putting in limit checks in loops when there aren't threads.
There isn't anything in either codegen that produces different types of
limit checks for depending on whether or not there are threads.
Anyways, looking at limit-checks should probably go on the "SSA todo"
list; I think that insert limit-checks will sometimes hoist limit checks
into non-allocating loops, which is bad, for the comparisions above. It
may also make sense to make limit-check types more fine-grained -- i.e.,
decide when we really need the checkFrontier loop, as above.