new codegen
Matthew Fluet
mfluet@intertrust.com
Mon, 13 Aug 2001 15:21:22 -0700 (PDT)
Status of the new codegen: It looks pretty stable now; regressions,
benchmarks, self-compiles all pass without issue.
Benefits over the old codegen:
- carry floating-point values in registers across blocks
- carry MLton stack values (both integer and fp) in registers across blocks
- faster MemLoc.eq function using hash-consing
Downsides:
- although MemLoc.eq function is much faster, programs with large numbers
of stack values have slow compile times. A self-compile is still much too
slow. (But, compiling with -native-live-stack false is fine.)
- still some benchmark weirdities to track down.
Anyways, here are the latest round of benchmark numbers.
mlton == new codegen, -native-live-stack true
mlton-stable == mlton-20010806
mlton-old == new codegen, -native-live-stack false
compile time
benchmark MLton stable MLton old MLton
barnes-hut 2.9 2.7 2.6
checksum 0.7 0.8 0.7
count-graphs 2.3 2.0 1.8
DLXSimulator 6.0 4.5 4.1
fft 1.6 1.4 1.3
fib 0.6 0.7 0.6
hamlet 55.5 54.2 48.8
knuth-bendix 2.7 2.5 2.3
lexgen 7.0 5.7 5.4
life 1.6 1.4 1.4
logic 7.8 7.8 7.1
mandelbrot 0.7 0.8 0.7
matrix-multiply 0.7 0.8 0.7
md5 2.1 2.9 1.8
merge 0.7 0.8 0.7
mlyacc 38.2 20.3 19.5
mpuz 1.0 1.0 0.9
nucleic 3.5 4.3 3.4
peek 1.2 1.1 1.1
psdes-random 0.7 0.8 0.7
ratio-regions 3.8 2.8 2.6
ray 4.5 3.5 3.5
raytrace 11.5 10.4 9.4
simple 9.5 7.1 6.5
smith-normal-form 8.5 8.0 7.3
tailfib 0.6 0.7 0.6
tak 0.6 0.7 0.6
tensor 3.7 3.0 3.0
tsp 1.7 1.8 1.5
vector-concat 0.7 0.8 0.7
vector-rev 0.7 0.8 0.7
vliw 16.0 13.0 12.4
wc-input1 1.9 1.6 1.7
wc-scanStream 2.2 1.7 1.8
zebra 11.0 5.3 5.4
zern 1.2 1.1 1.1
Comments: All in all, not too bad on these small programs. You can see
the slowdown with -live-stack-slots true on some of the larger programs.
zebra must have lots of stuff live across limit-checks, because it's
really slowing down compared to not tracking the stack.
run time
benchmark MLton stable MLton old MLton
barnes-hut 4.9 5.3 5.0
checksum 4.3 4.5 4.1
count-graphs 5.8 6.1 5.9
DLXSimulator 12.6 14.0 13.2
fft 8.5 8.8 8.5
fib 4.1 4.7 4.3
hamlet 9.2 10.3 9.0
knuth-bendix 8.1 8.6 8.4
lexgen 12.5 13.6 13.2
life 14.2 10.7 12.0
logic 24.4 27.8 26.0
mandelbrot 7.7 8.9 8.9
matrix-multiply 5.9 6.2 5.2
md5 4.7 5.0 4.4
merge 39.0 40.9 39.0
mlyacc 10.3 11.0 10.7
mpuz 6.4 6.8 7.4
nucleic 8.0 8.4 8.3
peek 4.4 4.9 5.2
psdes-random 6.0 5.6 5.7
ratio-regions 9.1 9.5 9.3
ray 4.8 5.3 4.8
raytrace 5.9 6.6 5.9
simple 6.8 7.4 7.0
smith-normal-form 1.1 1.1 1.1
tailfib 16.7 25.1 19.9
tak 10.7 10.4 9.9
tensor 7.0 7.3 9.1
tsp 11.8 12.6 12.1
vector-concat 4.7 7.6 7.5
vector-rev 3.3 3.2 3.2
vliw 6.7 7.4 7.0
wc-input1 3.5 3.0 2.8
wc-scanStream 3.1 4.4 4.5
zebra 3.4 3.1 3.1
zern 36.2 42.8 40.0
Comments: All in all, there are some decent improvements and some
slow-downs. I haven't examined any of these in detail, so I can't make
any specific guesses as to what's going wrong.
We want stable Mlton > old MLton > MLton. This happens with
22 of the benchmarks.
size
benchmark MLton stable MLton old MLton
barnes-hut 60,896 63,672 59,408
checksum 20,916 23,788 20,964
count-graphs 42,764 45,564 40,604
DLXSimulator 83,052 96,644 78,348
fft 30,416 31,760 29,408
fib 20,924 23,692 20,940
hamlet 981,567 1,088,159 940,103
knuth-bendix 60,653 67,253 59,277
lexgen 125,644 138,092 121,436
life 39,012 38,500 38,260
logic 147,260 160,236 148,268
mandelbrot 20,948 23,756 20,932
matrix-multiply 21,324 24,268 21,356
md5 34,645 36,477 34,069
merge 21,932 25,076 21,916
mlyacc 435,196 463,764 408,668
mpuz 26,548 28,196 26,580
nucleic 60,684 61,812 60,380
peek 28,973 30,381 28,381
psdes-random 21,884 24,812 21,932
ratio-regions 45,540 46,124 42,324
ray 71,503 73,831 65,823
raytrace 176,868 180,076 157,156
simple 152,080 165,800 146,160
smith-normal-form 136,780 140,604 133,132
tailfib 20,620 23,460 20,668
tak 20,956 23,748 20,972
tensor 63,219 66,531 62,547
tsp 34,749 36,469 33,901
vector-concat 21,460 24,468 21,588
vector-rev 21,372 24,324 21,404
vliw 269,304 299,720 257,832
wc-input1 39,861 40,397 38,485
wc-scanStream 42,125 43,021 40,845
zebra 133,997 118,997 106,605
zern 26,551 28,007 26,487
These look fine to me. The new codegen is cutting down a little on the
codesize.