[MLton] cvs commit: rewrote x86.Block.compress to run in linear
time
Stephen Weeks
MLton@mlton.org
Thu, 1 Jul 2004 16:08:56 -0700
> I take it from the lack of victory cries that this wasn't the dragon
> we needed to slay.
Nonsense. Merely an indication that I occasionally eat. :-)
While I was having lunch, I left a compile running. It finished, and
here are the some timings:
MLton starting
Compile SML starting
pre codegen starting
closureConvertSimplify starting
localRef starting
multi starting
multi finished in 0.61 + 5.93 (91% GC)
localRef finished in 99.32 + 56.54 (36% GC)
closureConvertSimplify finished in 199.37 + 119.92 (38% GC)
backend finished in 112.45 + 41.49 (27% GC)
pre codegen finished in 481.06 + 321.58 (40% GC)
x86 code gen starting
outputAssembly starting
translateChunk totals 39.02 + 10.62 (21% GC)
simplify totals 1584.76 + 92.90 (6% GC)
generateTransfers totals 115.97 + 4.83 (4% GC)
allocateRegisters totals 922.38 + 40.59 (4% GC)
outputAssembly finished in 3636.43 + 152.73 (4% GC)
x86 code gen finished in 3637.45 + 152.77 (4% GC)
Compile SML finished in 4118.61 + 474.35 (10% GC)
Compile C and Assemble starting
Compile C and Assemble finished in 51.02 + 0.00 (0% GC)
Link starting
Link finished in 191.49 + 0.00 (0% GC)
MLton finished in 4361.35 + 474.49 (10% GC)
So, we're down from 8.6 to 1.3 hours. That quadratic compress was
certainly most of the problem. The remaining problems are
* localRef is taking too long. I still don't know why. It's not the
multi subpass. Any ideas? In any case, that's the only remaining
glaring problem in the pre codegen.
* There are a couple of really large .S files (30M and 40M). And
simplify and allocate registers take a huge chunk of time. I'll try
a compile with -native-optimize 0 to see what happens. Perhaps
another possibility would be for the codegen to automatically treat
native-optimize as zero when compiling procedures that are too large
(as we do now for the globals). Other possibilities would be to do
this only for main, or for procedures that are only called once
(which we can certainly prove for main).
Hopefully fixing those will get us down around a half hour. Also,
remember this compiled was run -verbose 3, which causes some slowdown
to compute all the IL sizes. So we can probably get another 10%
from switching that off.