[MLton] cvs commit: rewrote x86.Block.compress to run in linear time

Matthew Fluet fluet@cs.cornell.edu
Thu, 1 Jul 2004 21:06:07 -0400 (EDT)

>       x86 code gen starting
> 	 outputAssembly starting
> 	    translateChunk totals 39.02 + 10.62 (21% GC)

Nice.  Interesting to note that the GC% is pretty much the same.

> So, we're down from 8.6 to 1.3 hours.  That quadratic compress was
> certainly most of the problem.

Great.  So we can put "Compile time improvements of 6.5X on some
programs." into the release notes?  ;-)

> * localRef is taking too long.  I still don't know why.  It's not the
>   multi subpass.  Any ideas?  In any case, that's the only remaining
>   glaring problem in the pre codegen.

You might trace the SSA- restore pass with a Control.traceBatch.

> * There are a couple of really large .S files (30M and 40M).

Can't be helped if there really are MachineIL functions of that size.

>   And simplify and allocate registers take a huge chunk of time.  I'll
>   try a compile with -native-optimize 0 to see what happens.  Perhaps
>   another possibility would be for the codegen to automatically treat
>   native-optimize as zero when compiling procedures that are too large
>   (as we do now for the globals).  Other possibilities would be to do
>   this only for main, or for procedures that are only called once
>   (which we can certainly prove for main).

The peephole optimizer doesn't do very well on very large basic blocks.
That's actually the reason why I turned it off for globals: the size of
the basic blocks, not the size of the function itself.

Looking at the -verbose 3 you sent out earlier, the biggest problem seems
to be in copyPropagate.  I glanced at that code and it does have a
quadratic running time in the size of the basic block.
You can turn that off with -native-copy-prop false.

-native-optimize 0 will also cut down the toLiveness time in
allocateRegisters.  Essentially, that is responsible for computing the
hints to the register allocator.