[MLton] cvs commit: rewrote x86.Block.compress to run in linear time

Matthew Fluet fluet@cs.cornell.edu
Fri, 2 Jul 2004 09:00:56 -0400 (EDT)

> > > * localRef is taking too long.  I still don't know why.  It's not the
> > >   multi subpass.  Any ideas?  In any case, that's the only remaining
> > >   glaring problem in the pre codegen.
> >
> > You might trace the SSA- restore pass with a Control.traceBatch.
> I put a trace around restore and here's what I got
> 	    localRef starting
> 	       multi starting
> 	       multi finished in 0.61 + 5.93 (91% GC)
> 	       restore starting
> 	       restore finished in 0.91 + 0.00 (0% GC)
> 	       restore starting
> 	       restore finished in 0.00 + 0.00 (0% GC)
> 	       restore starting
> 	       restore finished in 0.00 + 0.00 (0% GC)
> 	    localRef finished in 99.32 + 56.54 (36% GC)
> So, it looks there are only three functions with local refs, and the
> restores don't hurt.

Agreed, it's not the restore pass.

How big are the globals in HOL?  I see that the sub-pass that tries to
move global refs that are only used in one function has complexity:
  #SSA functions * #SSA globals.
This is particularly bad if most globals aren't (localizable) refs.

There is a simple solution using a Func.t property list holding a
locals:Statement.t list ref.  After computing the funcUses, do one
Vector.keepAllMap through the globals, pushing each ref that is used in
exactly one place onto the appropriate locals list and keeping the
remaining statements.

Then we only need one pass through the SSA functions.  We can move
everything under (*Localize global refs*) in with the later
List.revMap(functions, ...).

> > Looking at the -verbose 3 you sent out earlier, the biggest problem seems
> > to be in copyPropagate.  I glanced at that code and it does have a
> > quadratic running time in the size of the basic block.
> > You can turn that off with -native-copy-prop false.
> >
> If we could improve the quadratic-ness of copyPropagate, we could
> probably get most of that time back.  Any thoughts on the difficulty
> of that?

It's definitely possible.  Essentially, for every MOV in a basic block,
iterate over the remaining statements and replace any occurences of the
dst operand with the src operand.  Rather than iterating, we can maintain
a set of seen copies and attempt to apply them to each statement.  Its a
little more complicate because x86 MemLoc.t's correspond to memory
addressing, in terms of other operands, so you need to abort the copy
propagation upon encountering an instruction that changes the meaning of
one of the src/dst operands that is being copied.