CVS Commit
Matthew Fluet
fluet@CS.Cornell.EDU
Mon, 17 Sep 2001 10:09:58 -0400 (EDT)
http://www.cs.cornell.edu/People/fluet/MLton/directed-graph.tgz
Added an implementation of the G. Ramalingam loop forest construction, as
described in "On Loops, Dominators, and Dominance Frontiers" (originally
in PLDI00; revised technical report at
http://www.research.ibm.com/people/r/rama/Papers/ibmtr21513.revised.ps).
******************************************************************************
(This doesn't need to go into the CVS commit log. Just some context.)
This was motivated by the x86-live-transfers module, which failed to carry
the right values around in registers. For example, from tensor.sml
(ignoring overflow)
loopk_3:
testl SI(256),SI(256)
jz L_377
L_373:
decl SI(256)
incl SI(252)
incl SI(248)
jmp loopk_3
...
L_377:
decl SI(244)
...
Thinking about what to carry in registers at loopk_3, the previous
heuristic was to see which values were used nearest in the future. This
meant that SI(244) looked "better" than SI(248), and I might end up
carryingSI(244) in a register, but not SI(248). With the loop forest, I
give SI(248) (and the other values live inside the loopk_3 loop) a higher
weight than those that are live outside the loop, and it now "does the
right thing". This got a nice speed up on tensor.sml (although, it's
still lagging a ways behind the C codegen).
I'm going to do a big commit of new x86 codegen stuff tomorrow after
running a round of benchmarks tonight. (It should have run last night,
but I had bugs in the benchmark program.)
Hopefully, the loop forest stuff will also be applicable to the SSA IL and
loop optimizations.