benchmarks
Matthew Fluet
fluet@CS.Cornell.EDU
Thu, 8 Nov 2001 18:18:34 -0500 (EST)
Latest benchmarks with SSA IL.
MLton0 is 20011006 release
MLton1 is full CPS simplify and SSA simplify
MLton2 is just SSA simplify
MLton2 is still pretty bad; more than 281X slowdown on tailfib before
running out of memory!!
MLton1 generally has a little bit of a slowdown; I think this makes sense
-- second rounds of flatten and local-flatten without the shrinker just
make redundant tuple allocation and selects; gazillions of gotos to gotos
(the x86-codegen cleans up some of those, in the sense that there the jmp
instruction is eliminated, but shuffling values in stack slots are still
going on). Strangely, there are some decent speedups in ratio-regions and
wc-input1, a 2X speed up in matrix-multiply, and an amazing 9X speedup
in md5!
MLton0 -- mlton-stable
MLton1 -- mlton
MLton2 -- mlton -drop-pass removeUnused1CPS -drop-pass leafInlineCPS
-drop-pass raiseToJump1CPS -drop-pass contify1CPS -drop-pass
localFlatten1CPS -drop-pass constantPropagationCPS -drop-pass uselessCPS
-drop-pass removeUnused2CPS -drop-pass unusedArgs1CPS -drop-pass
simplifyTypesCPS -drop-pass polyEqualCPS -drop-pass contify2CPS -drop-pass
inlineCPS -drop-pass localFlatten2CPS -drop-pass removeUnused3CPS
-drop-pass raiseToJump2CPS -drop-pass contify3CPS -drop-pass
unusedArgs2CPS -drop-pass introduceLoopsCPS -drop-pass loopInvariantCPS
-drop-pass flattenCPS -drop-pass localFlatten3CPS -drop-pass
commonSubexpCPS -drop-pass commonBlockCPS -drop-pass redundantTestsCPS
-drop-pass redundantCPS -drop-pass unusedArgs3CPS -drop-pass
removeUnused4CPS
compile time
benchmark MLton0 MLton1 MLton2
barnes-hut 2.2 2.6 7.0
checksum 0.6 0.6 1.2
count-graphs 1.6 2.0 4.5
DLXSimulator 3.8 4.8 13.3
fft 1.1 1.3 3.3
fib 0.7 0.6 0.9
hamlet 41.5 77.8 128.8
knuth-bendix 2.1 2.7 5.0
lexgen 4.7 6.9 13.5
life 1.2 1.6 2.4
logic 5.5 14.9 19.7
mandelbrot 0.6 0.6 1.1
matrix-multiply 0.6 0.7 1.3
md5 1.5 1.4 3.2
merge 0.8 0.7 1.1
mlyacc 19.1 29.5 41.7
mpuz 0.8 1.0 1.6
nucleic 2.9 3.2 4.9
peek 0.9 1.2 2.5
psdes-random 0.6 0.7 1.1
ratio-regions 2.4 3.0 7.2
ray 3.0 3.9 9.7
raytrace 8.2 10.5 27.8
simple 5.8 9.4 18.5
smith-normal-form 7.4 8.0 11.1
tailfib 0.6 0.6 1.0
tak 0.6 0.6 0.9
tensor 2.6 3.5 7.6
tsp 1.4 1.9 3.8
tyan 3.4 5.0 11.2
vector-concat 0.6 0.6 1.2
vector-rev 0.6 0.6 1.3
vliw 10.9 19.8 33.9
wc-input1 1.5 1.9 3.6
wc-scanStream 1.6 2.1 3.9
zebra 8.2 8.1 9.7
zern 1.0 1.1 2.6
run time
benchmark MLton0 MLton1 MLton2
barnes-hut 3.9 4.8 55.2
checksum 3.2 3.1 * -- Out of memory
count-graphs 4.9 4.7 62.3
DLXSimulator 15.1 15.7 77.7
fft 7.7 9.1 119.3
fib 3.4 3.4 5.1
hamlet 8.1 14.8 87.2
knuth-bendix 6.5 10.5 59.4
lexgen 10.5 12.6 237.9
life 7.8 18.3 44.1
logic 25.7 34.8 58.4
mandelbrot 6.7 7.0 325.1
matrix-multiply 5.2 2.8 85.9
md5 3.3 0.4 167.2
merge 48.9 48.7 178.8
mlyacc 9.4 13.1 83.2
mpuz 4.6 6.3 32.8
nucleic 6.8 12.6 154.4
peek 3.4 3.6 * -- Out of memory
psdes-random 3.4 3.4 * -- Out of memory
ratio-regions 8.2 7.4 331.4
ray 3.8 4.0 68.9
raytrace 4.6 6.6 178.6
simple 6.0 7.0 63.2
smith-normal-form 0.9 0.9 1.3
tailfib 16.3 16.3 * -- Out of memory (> 4500.0)
tak 7.9 8.9 59.0
tensor 7.1 7.0 42.2
tsp 9.0 8.9 215.7
tyan 19.5 26.3 117.5
vector-concat 5.7 6.4 255.3
vector-rev 4.1 4.3 150.3
vliw 6.2 7.6 44.2
wc-input1 2.2 2.1 108.3
wc-scanStream 3.6 3.4 96.0
zebra 2.2 5.5 14.8
zern 33.9 33.0 917.3
run time ratio
benchmark MLton1 MLton2
barnes-hut 1.2 14.0
checksum 1.0 *
count-graphs 1.0 12.8
DLXSimulator 1.0 5.2
fft 1.2 15.6
fib 1.0 1.5
hamlet 1.8 10.8
knuth-bendix 1.6 9.1
lexgen 1.2 22.7
life 2.3 5.6
logic 1.4 2.3
mandelbrot 1.0 48.6
matrix-multiply 0.5 16.6
md5 0.1 50.7
merge 1.0 3.7
mlyacc 1.4 8.8
mpuz 1.4 7.1
nucleic 1.9 22.8
peek 1.0 *
psdes-random 1.0 *
ratio-regions 0.9 40.3
ray 1.0 18.0
raytrace 1.5 39.2
simple 1.2 10.5
smith-normal-form 1.0 1.4
tailfib 1.0 * (> 281.3)
tak 1.1 7.5
tensor 1.0 5.9
tsp 1.0 23.9
tyan 1.4 6.0
vector-concat 1.1 45.1
vector-rev 1.0 36.7
vliw 1.2 7.1
wc-input1 0.9 48.6
wc-scanStream 1.0 26.8
zebra 2.5 6.9
zern 1.0 27.1
size
benchmark MLton0 MLton1 MLton2
barnes-hut 59,793 66,272 208,240
checksum 20,917 21,576 36,312
count-graphs 40,461 44,816 127,288
DLXSimulator 78,237 92,432 342,184
fft 29,441 31,564 91,660
fib 20,909 21,568 31,824
hamlet 945,328 1,941,459 3,149,723
knuth-bendix 59,710 70,881 152,049
lexgen 122,061 173,224 356,888
life 38,565 48,536 70,712
logic 147,501 349,384 642,936
mandelbrot 20,901 21,536 36,304
matrix-multiply 21,309 21,760 40,792
md5 34,038 30,249 92,209
merge 21,885 22,784 35,376
mlyacc 409,501 684,472 1,213,080
mpuz 26,645 29,560 50,328
nucleic 60,653 65,680 131,968
peek 28,542 32,345 71,169
psdes-random 21,901 22,640 38,560
ratio-regions 41,893 51,528 192,608
ray 66,688 83,851 268,899
raytrace 159,381 216,552 851,304
simple 146,913 232,164 526,948
smith-normal-form 141,053 146,348 248,140
tailfib 20,637 21,240 32,144
tak 20,957 21,680 32,208
tensor 62,516 74,163 184,987
tsp 33,774 37,481 118,313
tyan 77,054 109,513 313,577
vector-concat 21,557 22,264 39,216
vector-rev 21,389 22,056 40,456
vliw 261,417 496,868 988,180
wc-input1 39,222 47,585 100,529
wc-scanStream 41,614 51,297 112,049
zebra 103,502 195,489 238,961
zern 26,504 28,139 72,155