self-compile
Matthew Fluet
fluet@research.nj.nec.com
Thu, 17 Aug 2000 17:02:13 -0400 (EDT)
> In other news, with some other optimizations from yesterday, the
> x86-codegen is winning in all benchmarks except checksum. (Yes, this
> includes life and wc.) I'll send out some hard numbers later today -- I'm
> trying to track down why I lost some performance on fib and tak, which
> were previously the best improvments. My guess is that it is an artifact
> of only committing live pseudo-regs down their respective branches. I
> lift out all pseudo-regs that are live down both, and then make the fall
> through case be the branch with the most remaining live pseudo-regs. This
> might reverse some branches and screw up the branch prediction.
I tracked down the source of that performance drop. Nothing to do with
the saving of live pseudo-regs. It had to do with the translation of the
MachineOutput.Move statement. I've gone back and forth on whether or not
that move should force the destination to a register or to an address at
register allocation. For a while I thought register, but then I noticed a
lot of SX()'s as the destinations, so I switched it to address. Yesterday
I noticed a lot of RX()'s as the destinations so I switched it back to
register. Turns out the best solution is to make the decision on the type
of operand -- register for RX()'s and address for everything else. That
regained the time on fib and tak and also they also benefitted from the
new peephole optimizations.
The peephole optimization that I think got the big win was the following
RI(0) = Int_add(SI(4), X)
SI(4) = RI(0)
|
| translates to
V
movl SI(4), RI(0)
addl X, RI(0)
movl RI(0), SI(4)
|
| the new peephole optimization (also works similarly on unary and
| shift/rotate instructions); not dependent on the equality of the first
| movl's source and the second movl's destination, but this is a common
| case.
V
movl SI(4), SI(4)
addl X, SI(4)
|
| the self-move elimination optimization (now more important than it was
| before)
V
addl X, SI(4)
(And I'll switch addl to incl if X is 1)
This really helps loop index variables and probably also tail-recursive
functions.