intraprocedural flattening
Stephen Weeks
MLton@sourcelight.com
Thu, 19 Jul 2001 11:01:31 -0700
I implemented a simple intraprocedural flattening algorithm last night. It
flattens a jump argument of tuple type if that argument only flows to select
statements. This gets, for example, the problematic loop in Vector.unfold. The
analysis is very fast, passes all the regressions, benchmarks, and
self-compiles. The analysis runs at three places during CPS simplification,
pretty much right after contification.
On a self-compile, one of the passes saw 502 tupled jump arguments, and was able
to flatten 470 of them. This didn't lead to spectacular performance
improvement in self-compile time, but did lead to a noticeable improvement.
We're now down to 474.5 seconds (breaking the 8 minute barrier), even including
the three extra passes of local flattening.
Below are the benchmark results, where "old MLton" is without local flattening.
There is a bit of a slowdown (ratio .9 or .8 old MLton to new MLton) on a few of
the benchmarks, a bit of a speedup on about the same number, and a very nice
speedup on psdes-random and wc-scanStream, presumably due to removing an
allocation from of a loop.
I am a bit confused by the slowdown, since I am fairly sure that the
optimization will never introduce new allocation. It does, however, move selects
earlier, and can cause selects to occur that would not have otherwise. I think
I will combine the optimization with a forward analysis that only flattens a
jump argument if only tuples flow into it (in addition to the original
condition).
run time ratio
benchmark old MLton
barnes-hut 1.0
checksum 0.9
count-graphs 1.2
DLXSimulator 0.9
fft 1.1
fib 1.0
hamlet 1.1
knuth-bendix 1.0
lexgen 1.0
life 1.1
logic 1.2
mandelbrot 1.1
matrix-multiply 0.9
md5 1.0
merge 1.0
mlyacc 1.0
mpuz 1.0
nucleic 1.0
peek 1.1
psdes-random 1.5
ratio-regions 1.0
ray 1.0
raytrace 1.0
simple 1.0
smith-normal-form 1.0
tak 0.9
tensor 0.9
tsp 1.0
vector-concat 0.9
vector-rev 0.8
vliw 1.1
wc-input1 1.0
wc-scanStream 1.6
zebra 1.0
zern 1.0
compile time
benchmark MLton old MLton
barnes-hut 2.6 2.6
checksum 0.7 0.7
count-graphs 1.7 1.8
DLXSimulator 4.1 4.0
fft 1.5 1.4
fib 0.7 0.7
hamlet 51.1 48.2
knuth-bendix 2.4 2.3
lexgen 5.4 5.3
life 1.4 1.5
logic 7.2 7.3
mandelbrot 0.7 0.8
matrix-multiply 0.7 0.8
md5 2.3 2.3
merge 0.7 0.8
mlyacc 19.2 19.1
mpuz 0.9 0.9
nucleic 4.0 4.0
peek 1.1 1.1
psdes-random 0.7 0.8
ratio-regions 3.0 3.0
ray 3.3 3.2
raytrace 9.8 9.6
simple 7.1 7.2
smith-normal-form 7.7 7.7
tak 0.7 0.7
tensor 3.0 2.9
tsp 1.8 1.7
vector-concat 0.7 0.8
vector-rev 0.7 0.8
vliw 12.1 11.9
wc-input1 1.7 1.6
wc-scanStream 1.8 1.8
zebra 4.8 4.7
zern 1.2 1.1
run time
benchmark MLton old MLton
barnes-hut 5.3 5.4
checksum 5.1 4.4
count-graphs 6.0 7.2
DLXSimulator 14.0 12.7
fft 7.9 9.1
fib 4.7 4.7
hamlet 9.8 10.7
knuth-bendix 8.4 8.1
lexgen 13.3 13.5
life 10.9 11.6
logic 26.8 31.0
mandelbrot 8.7 9.3
matrix-multiply 6.0 5.6
md5 4.9 4.9
merge 39.0 40.8
mlyacc 10.7 10.9
mpuz 7.0 6.9
nucleic 8.3 8.5
peek 4.9 5.2
psdes-random 6.1 9.2
ratio-regions 9.4 9.3
ray 6.0 6.1
raytrace 6.5 6.5
simple 7.3 7.4
smith-normal-form 1.1 1.1
tak 11.3 10.6
tensor 9.5 8.8
tsp 12.3 12.3
vector-concat 6.2 5.4
vector-rev 4.4 3.5
vliw 7.0 7.5
wc-input1 2.6 2.7
wc-scanStream 5.2 8.2
zebra 3.2 3.1
zern 45.1 44.8
size
benchmark MLton old MLton
barnes-hut 63,672 63,904
checksum 21,812 23,396
count-graphs 41,076 43,228
DLXSimulator 88,060 87,500
fft 34,960 33,272
fib 21,564 23,212
hamlet 996,871 973,471
knuth-bendix 62,013 62,093
lexgen 129,916 131,004
life 38,548 40,052
logic 147,516 151,060
mandelbrot 21,532 23,276
matrix-multiply 22,236 23,740
md5 36,341 36,845
merge 22,708 24,348
mlyacc 427,948 423,196
mpuz 26,644 28,228
nucleic 58,780 60,420
peek 29,989 29,885
psdes-random 22,836 24,516
ratio-regions 60,372 59,932
ray 73,351 73,327
raytrace 183,628 180,572
simple 169,160 171,208
smith-normal-form 141,092 141,820
tak 21,596 23,236
tensor 69,259 68,619
tsp 39,101 39,069
vector-concat 22,356 23,828
vector-rev 22,364 23,836
vliw 272,776 262,936
wc-input1 42,661 42,533
wc-scanStream 45,285 45,093
zebra 111,501 111,485
zern 29,895 27,319