[MLton] cvs commit: decreasing liveness information in large SSA functions

Thu, 1 Jul 2004 11:37:05 -0400 (EDT)

> > You can try -native-optimize 0 to turn off all the x86-codegen
> > optimizations.
>
> Running now.

Kill it.  It does nothing for translateChunk.  It'll help with simplify
and allocateRegisters.

> > If you system has nothing to do overnight, you could also try -verbose 3
> > so we can see the breakdown in the x86-codegen sub-passes.
>
> Here's the data from last night's compile.
>
> 	 outputAssembly starting
> 	    translateChunk totals 21125.19 + 5368.05 (20% GC)

Not expecting that.  On the upside, though, that might be a good sign.
translateChunk isn't doing anything complicated.  It's building up a big
AppendList.t, then doing a AppendList.toList, and then doing
x86.Block.compress.  My guess is that it is x86.Block.compress.  Feel free
to look into that.

Here's the gist of the compress.  The basic blocks coming from MachineIL
sometimes need to be split for particular primitives that require control
flow for efficient implementation.  So, there isn't a one-to-one mapping
from MachineIL blocks to x86Blocks.  So, I create pseudo-blocks with an
optional entry and an optional transfer.  Any place where there is a NONE
transfer falling into a NONE entry, we really intend for those blocks to
be fused and have their statements merged.  Compress does this in the
naive way.  If HOL's main function is huge, but pretty much straight line
code, then there will be a gazillion pseudo-blocks from each MachineIL
statement, which will one by one be appended (using @) to the growing
"real" block.

That will do a lot of allocation, but generate tons of ephemeral garbage,
which if we've gone over to a generation GC, would explain a relative low
GC time.

It should be pretty simple to come up with a more efficient compress using
AppendList.fold.