[MLton] cvs commit: rewrote x86.Block.compress to run in linear
time
Matthew Fluet
fluet@cs.cornell.edu
Thu, 1 Jul 2004 21:06:07 -0400 (EDT)
> x86 code gen starting
> outputAssembly starting
> translateChunk totals 39.02 + 10.62 (21% GC)
Nice. Interesting to note that the GC% is pretty much the same.
> So, we're down from 8.6 to 1.3 hours. That quadratic compress was
> certainly most of the problem.
Great. So we can put "Compile time improvements of 6.5X on some
programs." into the release notes? ;-)
> * localRef is taking too long. I still don't know why. It's not the
> multi subpass. Any ideas? In any case, that's the only remaining
> glaring problem in the pre codegen.
You might trace the SSA- restore pass with a Control.traceBatch.
> * There are a couple of really large .S files (30M and 40M).
Can't be helped if there really are MachineIL functions of that size.
> And simplify and allocate registers take a huge chunk of time. I'll
> try a compile with -native-optimize 0 to see what happens. Perhaps
> another possibility would be for the codegen to automatically treat
> native-optimize as zero when compiling procedures that are too large
> (as we do now for the globals). Other possibilities would be to do
> this only for main, or for procedures that are only called once
> (which we can certainly prove for main).
The peephole optimizer doesn't do very well on very large basic blocks.
That's actually the reason why I turned it off for globals: the size of
the basic blocks, not the size of the function itself.
Looking at the -verbose 3 you sent out earlier, the biggest problem seems
to be in copyPropagate. I glanced at that code and it does have a
quadratic running time in the size of the basic block.
You can turn that off with -native-copy-prop false.
-native-optimize 0 will also cut down the toLiveness time in
allocateRegisters. Essentially, that is responsible for computing the
hints to the register allocator.