shrinker checked in
Stephen Weeks
MLton@sourcelight.com
Wed, 14 Nov 2001 09:45:51 -0800
> I'm running benchmarks tonight to compare the CPS and SSA optimizers
> and will report on them tomorrow.
Here are the results comparing the 20011006 release and the internal
versions with ssa and cps optimization. Compile times for the
internal version are of course horrible, because it is the NJ compiled
version. The runtime ratios are all fine except for knuth-bendix.
Once that is fixed, I think we're ready to drop CPS. Oh yeah, of
course we need to add enough to SSA.DirectExp and port the closure
converter first.
MLton0 -- /usr/local/bin/mlton (20011006 release)
MLton1 -- mlton -optimize-ssa false
MLton2 -- mlton -optimize-ssa true
compile time
benchmark MLton0 MLton1 MLton2
barnes-hut 2.5 16.8 23.2
checksum 0.7 2.8 3.3
count-graphs 1.9 11.3 14.2
DLXSimulator 4.2 28.7 41.4
fft 1.3 7.6 10.8
fib 0.7 2.4 2.6
hamlet 44.6 358.0 517.4
knuth-bendix 2.4 13.0 19.0
lexgen 5.2 35.7 50.5
life 1.4 7.2 8.9
logic 6.0 36.5 27.1
mandelbrot 0.7 2.7 3.4
matrix-multiply 0.8 3.0 3.5
md5 1.7 9.7 10.3
merge 0.8 2.9 3.3
mlyacc 20.9 142.3 196.0
mpuz 1.0 4.4 5.0
nucleic 3.2 19.2 19.3
peek 1.1 5.7 7.6
psdes-random 0.8 2.9 3.3
ratio-regions 2.6 17.1 27.5
ray 3.4 22.1 29.8
raytrace 8.9 62.9 95.0
simple 6.4 39.2 56.3
smith-normal-form 7.5 38.3 42.9
tailfib 0.7 2.4 2.5
tak 0.7 2.5 2.6
tensor 3.0 19.9 27.3
tsp 1.6 9.4 12.9
tyan 3.7 26.7 38.2
vector-concat 0.8 2.9 3.3
vector-rev 0.7 2.7 3.1
vliw 12.0 82.2 108.2
wc-input1 1.7 9.2 11.6
wc-scanStream 1.7 9.7 12.6
zebra 9.3 37.6 56.7
zern 1.1 5.7 8.1
run time
benchmark MLton0 MLton1 MLton2
barnes-hut 5.0 5.2 5.4
checksum 4.0 3.7 3.7
count-graphs 5.6 5.7 5.7
DLXSimulator 13.3 15.0 14.2
fft 7.1 7.3 7.2
fib 4.6 4.6 4.6
hamlet 9.1 9.3 9.2
knuth-bendix 8.4 8.4 17.7
lexgen 12.7 12.3 13.2
life 10.6 10.7 9.3
logic 26.2 26.2 22.0
mandelbrot 9.1 9.4 9.2
matrix-multiply 6.8 6.7 7.3
md5 4.4 4.3 2.7
merge 38.7 39.5 39.3
mlyacc 10.7 10.7 10.8
mpuz 6.3 6.3 5.8
nucleic 8.3 9.1 9.4
peek 4.7 4.8 4.7
psdes-random 4.6 4.6 4.6
ratio-regions 9.0 9.2 9.2
ray 5.0 5.2 5.0
raytrace 5.8 6.0 6.2
simple 6.8 7.0 6.9
smith-normal-form 1.1 1.1 1.1
tailfib 22.2 22.1 22.3
tak 10.7 10.5 12.1
tensor 9.7 9.5 7.9
tsp 11.9 11.8 12.0
tyan 19.8 20.0 20.5
vector-concat 7.6 8.0 7.8
vector-rev 3.3 3.1 3.1
vliw 6.7 6.9 7.0
wc-input1 2.8 2.6 2.7
wc-scanStream 4.5 3.8 3.8
zebra 2.6 2.7 2.9
zern 37.7 38.4 38.8
run time ratio
benchmark MLton1 MLton2
barnes-hut 1.1 1.1
checksum 0.9 0.9
count-graphs 1.0 1.0
DLXSimulator 1.1 1.1
fft 1.0 1.0
fib 1.0 1.0
hamlet 1.0 1.0
knuth-bendix 1.0 2.1
lexgen 1.0 1.0
life 1.0 0.9
logic 1.0 0.8
mandelbrot 1.0 1.0
matrix-multiply 1.0 1.1
md5 1.0 0.6
merge 1.0 1.0
mlyacc 1.0 1.0
mpuz 1.0 0.9
nucleic 1.1 1.1
peek 1.0 1.0
psdes-random 1.0 1.0
ratio-regions 1.0 1.0
ray 1.0 1.0
raytrace 1.0 1.1
simple 1.0 1.0
smith-normal-form 1.0 1.0
tailfib 1.0 1.0
tak 1.0 1.1
tensor 1.0 0.8
tsp 1.0 1.0
tyan 1.0 1.0
vector-concat 1.1 1.0
vector-rev 1.0 1.0
vliw 1.0 1.0
wc-input1 0.9 0.9
wc-scanStream 0.9 0.8
zebra 1.0 1.1
zern 1.0 1.0
size
benchmark MLton0 MLton1 MLton2
barnes-hut 59,793 64,809 71,929
checksum 20,917 21,117 20,973
count-graphs 40,461 44,925 45,597
DLXSimulator 78,237 90,389 102,277
fft 29,441 30,665 33,641
fib 20,909 21,109 20,989
hamlet 945,328 1,289,200 1,643,672
knuth-bendix 59,710 66,878 83,230
lexgen 122,061 148,829 161,517
life 38,565 42,957 41,821
logic 147,501 253,949 92,301
mandelbrot 20,901 21,077 20,965
matrix-multiply 21,309 21,485 21,517
md5 34,038 36,390 32,278
merge 21,885 22,309 22,277
mlyacc 409,501 528,397 586,141
mpuz 26,645 27,773 27,085
nucleic 60,653 63,517 63,613
peek 28,542 31,686 33,286
psdes-random 21,901 22,165 22,069
ratio-regions 41,893 45,109 60,709
ray 66,688 80,976 88,384
raytrace 159,381 199,725 235,949
simple 146,913 182,497 190,769
smith-normal-form 141,053 145,445 151,797
tailfib 20,637 20,781 20,637
tak 20,957 21,133 21,133
tensor 62,516 68,332 72,892
tsp 33,774 36,126 42,670
tyan 77,054 91,326 106,334
vector-concat 21,557 21,789 21,749
vector-rev 21,389 21,613 21,541
vliw 261,417 318,009 360,201
wc-input1 39,222 46,030 48,110
wc-scanStream 41,614 48,590 50,806
zebra 103,502 130,086 144,326
zern 26,504 27,328 30,400
>
> Here's the log.
>
> --------------------------------------------------------------------------------
> The SSA shrinker now works. There are also a lot of other changes to the SSA IL
> and corresponding changes to the SSA optimizations and backend. Here's an
> overview.
>
> * There is a new switch, -optimize-ssa {true|false}, which determines whether
> the SSA or CPS optimizer is used. They are no longer run in sequence.
>
> * Implement handlers has been moved from the SSA simplify pipeline to backend,
> and also implements other invariants that the backend needs.
>
> * SSA IL changes
> - changed Raise transfer back to taking Var.t vector instead of Var.t.
> This simplified some of the optimizations and made it easier since it
> can often be treated exactly as return.
> - Change the return type of functions from Type.t vector to
> Type.t vector option. NONE indicates that a function does not return.
> - Added mayRaise: bool to functions to record whether or not a function may
> raise.
> - Changed returns to the following datatype
>
> datatype t =
> Dead
> | HandleOnly
> | NonTail of {cont: Label.t, handler: Handler.t}
> | Tail
>
> These correspond to 6 of the possible 9 combinations of continuation and
> handler each being one of {None, Caller, Some l}. None means that it
> doesn't matter what the continuation (handler) is since the caller never
> returns (raises). Caller means to keep the continuation (handler) the same
> as in the caller. Some l means a nontail call in the case of continuations
> and an installed handler in the case of handlers.
>
> 3 of the 9 possibilities are disallowed, and the correspondence is as below.
>
> Cont Handler equivalent
> ------ ------- ---------------------------------------
> None None Dead
> None Caller HandleOnly
> None Some h *disallowed*
> Caller None *disallowed*
> Caller Caller Tail
> Caller Some h *disallowed*
> Some l None Nontail {cont = l, handler = None}
> Some l Caller Nontail {cont = l, handler = Caller}
> Some l Some h Nontail {cont = l, handler = Handle l}
>
> We could have allowed the (None, Some h) and (Caller, Some h) cases, and
> put some code in the backend to generate stubs, since if there is a handler
> there must be some continuation. But I decided it was easier to just rule
> them out, essentially meaning that remove-unused, or any other optimization
> pass, needs to make stubs itself.
>
> The matchup between returns and what is now expressed in functions types
> (mayRaise and mayReturn) is checked in the type checker.
>
> The checked in compiler passes all regressions, but is not able to self-compile,
> or even be compiled, at least with 500m. The bloat due to having all of the CPS
> and SSA optimization passes is too much. Hopefully we will soon decide to pitch
> the CPS and will be able to self-compile again.