[MLton] Optimizations done by GCC 4.0
Jesper Louis Andersen
jlouis@mongers.org
Wed, 13 Oct 2004 00:32:18 +0200
I was running through the changes done in GCC 4.0 today. Now they have
worked a tree-saa into their optimization layer, so they are beginning
to get up to par with modern compiler technology. Here is a number of
optimizations (code transformations) they mention:
<url http://gcc.gnu.org/gcc-4.0/changes.html>
* Scalar replacement of aggregates
I do not know what they mean by this.
* Constant propagation
MLton already has this, and it is not hard to do on a SSA.
* Value range propagation
This looked interesting. It is basicly a worklist-algorithm working
on a lattice propagating the probabilty of branches being taken. This
is to aid in trace-scheduling etc.
<url http://www.pattosoft.com.au/jason/Papers/ValueRangeProp/>
is one url on the thing. Nearly 10 years old and I have never heard
about it. Something MLton does?
* Partial redundancy elimination
No clue, but a couple of guesses as to what it is.
* Load and store motion
If a loop loads and stores to the same memory address and the generate
by the load is killed by the store, then do all operation on a register
instead and hoist the load to before the loop and press the store down
after the loop.
Seems trivial to implement - wonder if it would give anything.
* Strength reduction
Ah, the good old optimization. I expect MLton to do this, though there
is no pass explicitly mentioning this. (loopInvariant maybe?)
* Dead store elimination
A store to something never used is dead, so it can be removed.
* Dead and unreachable code elimination
MLton already does this a lot.
* Autovectorization
Detection and use of SIMD instructions for vectorized operations. I
think it is a quite special optimization and the places one can apply
it are rather limited. I do not know how much it is needed though.
* Loop interchange
Interchange the way one traverses the memory so the cache plays nicely
with you. Canonical example is matrix multiplication. I do not know if
they do blocking optimizations too and treats the registers as another
cache layer.
* Tail recursion by accumulation
Contify!
--------------------------
I was wondering if this applies to MLton at all. Most of them are
already done, so I am wondering about the things MLton does not do, and
if it would give anything. I think there might be a reason why they are
not implemented (too little gain for too much analysis and too
specialized comes to mind).
Is there a complete description of all passes MLton does? If not, would
there be a point in having one? I would volunteer to inspect the source
and write down what I find as a starting point, though my english lacks
a bit of style (or maybe the style is not recognized as english yet,
heh). I was thinking of a short description of each pass together with
relevant book and paper references. I am also aware that I may not have
the in-depth knowledge needed to do this.
--
j.