limit check insertion
Stephen Weeks
MLton@sourcelight.com
Tue, 23 Oct 2001 22:25:53 -0700
It was an incredibly simple change to force limit checks to only be
coalesced within basic blocks -- a couple of lines in limit-check.fun
did the trick. I've checked in the change, and made this the default.
It passes all the regressions and a self-compile. It also fixes the
bug with vliw, but introduces a bug in simple, and has no effect on
the prodcons and mutex bugs.
Unfortunately, it increases code sizes (up to 50%!) and running times
(up to 30% or so), almost across the board, with the exception of
matrix multiply, which may be due to a limit check that is no longer
lifted into a loop. The stats are below.
For a self-compile, the numbers are bad. The mlton executable is
currently about 7.5M, but when compiled -limit-check-per-block true,
it blows up to 11.4M. Here are the times for three self compiles
Compiling new MLton using an old one.
Compile SML starting
Compile SML finished in 269.67 + 221.90 (45% GC)
Compile C starting
Compile C finished in 10.47 + 0.0 (0.0% GC)
Assemble starting
Assemble finished in 32.36 + 0.0 (0.0% GC)
Link starting
Link finished in 2.72 + 0.0 (0.0% GC)
MLton finished in 315.31 + 221.91 (41% GC)
size mlton-compile
text data bss dec hex filename
6447545 1045724 35940 7529209 72e2f9 mlton-compile
Compiling new MLton using itself (built above) and -limit-check-per-block true
Compile SML starting
Compile SML finished in 311.38 + 256.69 (45% GC)
Compile C starting
Compile C finished in 15.60 + 0.0 (0.0% GC)
Assemble starting
Assemble finished in 45.44 + 0.0 (0.0% GC)
Link starting
Link finished in 5.09 + 0.0 (0.0% GC)
MLton finished in 377.63 + 256.69 (40% GC)
size mlton-compile
text data bss dec hex filename
9995209 1381500 33764 11410473 ae1c29 mlton-compile
Compiling new MLton using itself (built above) and -limit-check-per-block true
Compile SML starting
Compile SML finished in 351.27 + 266.54 (43% GC)
Compile C starting
Compile C finished in 15.95 + 0.0 (0.0% GC)
Assemble starting
Assemble finished in 45.72 + 0.0 (0.0% GC)
Link starting
Link finished in 5.34 + 0.0 (0.0% GC)
MLton finished in 418.39 + 266.55 (39% GC)
size mlton-compile
text data bss dec hex filename
9995209 1381500 33764 11410473 ae1c29 mlton-compile
So the self-compile times in order were: 537, 634, 685. So we take a
big hit (100s) for creating a bigger executable, and a pretty bad hit
(50s) for running with extra limit checks.
It seems pretty clear that we will need a limit check insertion
algorithm that coalesces across blocks.
Feel free to look into the remaining bugs with simple, prodcons, and
mutex. I'll start in on 'em tomorrow.
--------------------------------------------------------------------------------
MLton0 -- mlton -limit-check-per-block false
MLton1 -- mlton -limit-check-per-block true
compile time
benchmark MLton0 MLton1
barnes-hut 2.5 2.5
checksum 0.7 0.7
count-graphs 1.8 1.9
DLXSimulator 4.2 4.3
fft 1.3 1.3
fib 0.6 0.6
hamlet 49.5 54.3
knuth-bendix 2.4 2.5
lexgen 5.4 5.7
life 1.4 1.5
logic 6.3 8.4
mandelbrot 0.7 0.7
matrix-multiply 0.7 0.7
md5 1.3 1.3
merge 0.7 0.7
mlyacc 22.7 24.9
mpuz 0.9 0.9
nucleic 3.2 3.1
peek 1.1 1.1
psdes-random 0.7 0.7
ratio-regions 2.6 2.7
ray 3.5 3.7
raytrace 9.3 9.4
simple 6.6 7.2
smith-normal-form 7.7 7.6
tailfib 0.6 0.6
tak 0.6 0.6
tensor 3.0 3.1
tsp 1.6 1.6
tyan 3.9 4.1
vector-concat 0.7 0.7
vector-rev 0.7 0.7
vliw 12.4 12.9
wc-input1 1.7 1.7
wc-scanStream 1.8 1.8
zebra 9.5 10.6
zern 1.1 1.1
run time
benchmark MLton0 MLton1
barnes-hut 5.0 5.7
checksum 3.7 4.1
count-graphs 5.6 6.6
DLXSimulator 15.0 15.0
fft 7.5 8.2
fib 4.6 5.0
hamlet 9.0 10.2
knuth-bendix 8.5 9.0
lexgen 12.2 13.7
life 11.0 12.1
logic 26.6 27.1
mandelbrot 9.4 9.6
matrix-multiply 6.7 5.3
md5 0.6 0.8
merge 39.5 45.9
mlyacc 10.5 11.2
mpuz 6.4 7.1
nucleic 8.5 9.1
peek 4.8 5.8
psdes-random 4.6 4.6
ratio-regions 9.1 9.8
ray 4.9 5.5
raytrace 5.8 6.7
simple 6.7 *
smith-normal-form 1.1 1.1
tailfib 22.1 22.5
tak 10.5 11.1
tensor 9.7 9.9
tsp 12.0 13.8
tyan 19.8 20.9
vector-concat 8.0 9.6
vector-rev 3.1 4.2
vliw * 7.0
wc-input1 2.8 3.6
wc-scanStream 4.3 4.8
zebra 2.7 3.0
zern 38.4 41.0
run time ratio
benchmark MLton1
barnes-hut 1.1
checksum 1.1
count-graphs 1.2
DLXSimulator 1.0
fft 1.1
fib 1.1
hamlet 1.1
knuth-bendix 1.1
lexgen 1.1
life 1.1
logic 1.0
mandelbrot 1.0
matrix-multiply 0.8
md5 1.3
merge 1.2
mlyacc 1.1
mpuz 1.1
nucleic 1.1
peek 1.2
psdes-random 1.0
ratio-regions 1.1
ray 1.1
raytrace 1.1
smith-normal-form 1.0
tailfib 1.0
tak 1.1
tensor 1.0
tsp 1.2
tyan 1.1
vector-concat 1.2
vector-rev 1.3
vliw ~1.0
wc-input1 1.3
wc-scanStream 1.1
zebra 1.1
zern 1.1
size
benchmark MLton0 MLton1
barnes-hut 62,201 64,505
checksum 21,101 21,157
count-graphs 42,837 46,429
DLXSimulator 85,221 95,061
fft 30,337 31,513
fib 21,093 21,093
hamlet 1,059,152 1,344,512
knuth-bendix 64,390 68,430
lexgen 134,493 148,733
life 40,373 43,141
logic 159,373 258,365
mandelbrot 21,061 21,117
matrix-multiply 21,485 21,533
md5 29,438 30,518
merge 22,237 22,325
mlyacc 449,501 540,109
mpuz 27,365 28,149
nucleic 61,365 63,293
peek 29,502 31,070
psdes-random 22,149 22,181
ratio-regions 44,117 47,989
ray 72,128 80,640
raytrace 174,157 214,533
simple 158,953 185,225
smith-normal-form 145,165 147,429
tailfib 20,765 20,789
tak 21,133 21,165
tensor 65,228 68,780
tsp 35,670 37,286
tyan 83,950 92,238
vector-concat 21,773 21,837
vector-rev 21,605 21,645
vliw 290,329 329,641
wc-input1 41,454 44,470
wc-scanStream 44,078 47,382
zebra 122,190 136,758
zern 27,288 27,584