good in general, but bad nested loops
Matthew Fluet
mfluet@intertrust.com
Tue, 10 Jul 2001 15:20:14 -0700 (PDT)
> So, it has changed the loop to count down from n to 0 instead of up from 0 to n.
> It has also kept five of the loop variables in registers, plus the counter.
> Most importantly, the hot loop, L_60, is only three instructions.
>
> We can get potentially the same as gcc, because we have the loop nest. We
> just(!) need to do the right register allocation and instruction reordering. Oh
> yeah, and figuring out the Overflows can't happen.
I agree that we can probably get close to gcc. Another aspect of the gcc
code is that it's able to get the 0 test for free by looking at the result
of the decrement. Part of that does need to be done in the backend; i.e.,
the peephole optimization that would turn
decl %eax dec %eax dec %eax
cmpl $0,%eax or testl %eax,%eax into je L_123
je L_123 jz L_123
(and this would be very easy to add to my peephole framework). But,
another aspect of this is the transformation that does the branch at the
end of the loop, rather than at the top. I think(?) that MLton generally
puts these branches at the top; CPS would seem to be the right place to
unroll and reorder these loops.