SSA simplify passes
Matthew Fluet
Matthew Fluet <fluet@CS.Cornell.EDU>
Mon, 7 Jan 2002 14:38:39 -0500 (EST)
> I think we really need to tie the two together (I don't see why a
> dependency couldn't occur in the other direction), so that we flatten the
> second argument to f2_0 conditionally on being able to flatten the
> argument to the ::_3 constructor. I'll see about integrating something
> like this tomorrow.
This wasn't that difficult to implement. The only real difference was
making function and block arguments that have tuple type potentially
flattenable; this really solves the example I gave before, because we can
optimiztically assume that the argument in the block that makes the
recursive call will be flattened -- hence the tuple components will be
available. It turns out that this can be done independently of how we
deal with datatypes; there is a choise at Case.Con's -- by introducing a
depenency between the Con.t and the target label's arguments we can set it
up so that if the datatype's arguments are not flattened, then the label's
arguments are not flattened. I think this makes sense, otherwise we would
need to introduce the selects at each target of the datatype.
Anyways, really preliminary results are o.k, but don't make sense:
MLton0 -- mlton -new-flatten false -tuple-recon-elim 0
MLton1 -- mlton -new-flatten false -tuple-recon-elim 1
MLton2 -- mlton -new-flatten false -tuple-recon-elim 2
MLton3 -- mlton -new-flatten true -tuple-recon-elim 0
MLton4 -- mlton -new-flatten true -tuple-recon-elim 1
MLton5 -- mlton -new-flatten true -tuple-recon-elim 2
compile time
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5
simple 7.35 7.29 7.33 7.36 7.41 7.38
tyan 3.81 3.82 3.78 3.90 3.74 3.74
run time
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5
simple 6.01 6.03 6.65 6.08 6.04 6.08
tyan 19.93 17.45 13.30 19.87 17.34 17.34
run time ratio
benchmark MLton1 MLton2 MLton3 MLton4 MLton5
simple 1.00 1.11 1.01 1.00 1.01
tyan 0.88 0.67 1.00 0.87 0.87
size
benchmark MLton0 MLton1 MLton2 MLton3 MLton4 MLton5
simple 194,996 194,244 197,660 198,108 197,996 197,980
tyan 91,897 91,177 89,353 89,369 88,649 88,649
So, the new flattener fixes the slowdown in simple.
What I don't understand is why -new-flatten true -tuple-recon-elim 2 isn't
faster with tyan. I would have expected it to be at least as fast as
-new-flatten false -tuple-recon-elim 2. What is very puzzling is that the
differences between the two SSA files are just differences in flattened
arguments to datatypes. With the new flattener, there are 4 list
datatypes that were flattened. But, the new flattener seems to be doing
more allocation:
[fluet@lennon tyan]$ ./tyan.false.2 @MLton gc-summary -- > /dev/null
max semispace size(bytes): 1,748,992
max stack size(bytes): 43,520
GC time(ms): 3,490 (12.3%)
maxPause(ms): 10
number of GCs: 1,135
bytes allocated: 1,445,752,052
bytes copied: 153,468,780
max bytes live: 182,196
[fluet@lennon tyan]$ ./tyan.true.2 @MLton gc-summary -- > /dev/null
max semispace size(bytes): 1,568,768
max stack size(bytes): 30,336
GC time(ms): 4,590 (13.8%)
maxPause(ms): 10
number of GCs: 1,555
bytes allocated: 1,982,465,680
bytes copied: 211,561,592
max bytes live: 180,464
I'll run through the rest of the benchmarks and see if there is anything
that sheds a little light on this.