new codegen

Matthew Fluet mfluet@intertrust.com
Mon, 13 Aug 2001 15:21:22 -0700 (PDT)


Status of the new codegen: It looks pretty stable now; regressions,
benchmarks, self-compiles all pass without issue.

Benefits over the old codegen:
- carry floating-point values in registers across blocks
- carry MLton stack values (both integer and fp) in registers across blocks
- faster MemLoc.eq function using hash-consing

Downsides:
- although MemLoc.eq function is much faster, programs with large numbers
of stack values have slow compile times.  A self-compile is still much too
slow.  (But, compiling with -native-live-stack false is fine.)
- still some benchmark weirdities to track down.

Anyways, here are the latest round of benchmark numbers.

mlton        == new codegen, -native-live-stack true
mlton-stable == mlton-20010806
mlton-old    == new codegen, -native-live-stack false

compile time
benchmark         MLton stable MLton old MLton
barnes-hut          2.9          2.7       2.6
checksum            0.7          0.8       0.7
count-graphs        2.3          2.0       1.8
DLXSimulator        6.0          4.5       4.1
fft                 1.6          1.4       1.3
fib                 0.6          0.7       0.6
hamlet             55.5         54.2      48.8
knuth-bendix        2.7          2.5       2.3
lexgen              7.0          5.7       5.4
life                1.6          1.4       1.4
logic               7.8          7.8       7.1
mandelbrot          0.7          0.8       0.7
matrix-multiply     0.7          0.8       0.7
md5                 2.1          2.9       1.8
merge               0.7          0.8       0.7
mlyacc             38.2         20.3      19.5
mpuz                1.0          1.0       0.9
nucleic             3.5          4.3       3.4
peek                1.2          1.1       1.1
psdes-random        0.7          0.8       0.7
ratio-regions       3.8          2.8       2.6
ray                 4.5          3.5       3.5
raytrace           11.5         10.4       9.4
simple              9.5          7.1       6.5
smith-normal-form   8.5          8.0       7.3
tailfib             0.6          0.7       0.6
tak                 0.6          0.7       0.6
tensor              3.7          3.0       3.0
tsp                 1.7          1.8       1.5
vector-concat       0.7          0.8       0.7
vector-rev          0.7          0.8       0.7
vliw               16.0         13.0      12.4
wc-input1           1.9          1.6       1.7
wc-scanStream       2.2          1.7       1.8
zebra              11.0          5.3       5.4
zern                1.2          1.1       1.1

Comments: All in all, not too bad on these small programs.  You can see
the slowdown with -live-stack-slots true on some of the larger programs.
zebra must have lots of stuff live across limit-checks, because it's
really slowing down compared to not tracking the stack.

run time
benchmark         MLton stable MLton old MLton
barnes-hut          4.9          5.3       5.0
checksum            4.3          4.5       4.1
count-graphs        5.8          6.1       5.9
DLXSimulator       12.6         14.0      13.2
fft                 8.5          8.8       8.5
fib                 4.1          4.7       4.3
hamlet              9.2         10.3       9.0
knuth-bendix        8.1          8.6       8.4
lexgen             12.5         13.6      13.2
life               14.2         10.7      12.0
logic              24.4         27.8      26.0
mandelbrot          7.7          8.9       8.9
matrix-multiply     5.9          6.2       5.2
md5                 4.7          5.0       4.4
merge              39.0         40.9      39.0
mlyacc             10.3         11.0      10.7
mpuz                6.4          6.8       7.4
nucleic             8.0          8.4       8.3
peek                4.4          4.9       5.2
psdes-random        6.0          5.6       5.7
ratio-regions       9.1          9.5       9.3
ray                 4.8          5.3       4.8
raytrace            5.9          6.6       5.9
simple              6.8          7.4       7.0
smith-normal-form   1.1          1.1       1.1
tailfib            16.7         25.1      19.9
tak                10.7         10.4       9.9
tensor              7.0          7.3       9.1
tsp                11.8         12.6      12.1
vector-concat       4.7          7.6       7.5
vector-rev          3.3          3.2       3.2
vliw                6.7          7.4       7.0
wc-input1           3.5          3.0       2.8
wc-scanStream       3.1          4.4       4.5
zebra               3.4          3.1       3.1
zern               36.2         42.8      40.0

Comments: All in all, there are some decent improvements and some
slow-downs.  I haven't examined any of these in detail, so I can't make
any specific guesses as to what's going wrong.

We want stable Mlton > old MLton > MLton.  This happens with
22 of the benchmarks.

size
benchmark           MLton stable MLton old MLton
barnes-hut         60,896       63,672    59,408
checksum           20,916       23,788    20,964
count-graphs       42,764       45,564    40,604
DLXSimulator       83,052       96,644    78,348
fft                30,416       31,760    29,408
fib                20,924       23,692    20,940
hamlet            981,567    1,088,159   940,103
knuth-bendix       60,653       67,253    59,277
lexgen            125,644      138,092   121,436
life               39,012       38,500    38,260
logic             147,260      160,236   148,268
mandelbrot         20,948       23,756    20,932
matrix-multiply    21,324       24,268    21,356
md5                34,645       36,477    34,069
merge              21,932       25,076    21,916
mlyacc            435,196      463,764   408,668
mpuz               26,548       28,196    26,580
nucleic            60,684       61,812    60,380
peek               28,973       30,381    28,381
psdes-random       21,884       24,812    21,932
ratio-regions      45,540       46,124    42,324
ray                71,503       73,831    65,823
raytrace          176,868      180,076   157,156
simple            152,080      165,800   146,160
smith-normal-form 136,780      140,604   133,132
tailfib            20,620       23,460    20,668
tak                20,956       23,748    20,972
tensor             63,219       66,531    62,547
tsp                34,749       36,469    33,901
vector-concat      21,460       24,468    21,588
vector-rev         21,372       24,324    21,404
vliw              269,304      299,720   257,832
wc-input1          39,861       40,397    38,485
wc-scanStream      42,125       43,021    40,845
zebra             133,997      118,997   106,605
zern               26,551       28,007    26,487

These look fine to me.  The new codegen is cutting down a little on the
codesize.