[MLton-devel] SPARC self-compiles and benchmarks
Stephen Weeks
MLton@mlton.org
Mon, 14 Apr 2003 16:02:27 -0700
This is all on my 500 MHz UltraSPARC-IIe with 640M RAM.
First, the self compiles.
A two-round bootstrap takes about 10 hours. The first round takes
about an 1 1/2 hours, of which the first 20 minutes is the generation
of C code and the second 70 minutes is the compilation of the C code.
The second round (which is using the slower mlton-stubs libraries)
takes about 8 1/2 hours, of which the first 7 1/2 hours is the
generation of C code and the last hour is the compilation of the C
code.
A fixpoint self-compile with the resulting compiler takes about 75
minutes, the first 17 being generation of C code and the remaining
hour the compilation of the C code.
Using my 1.6 GHz Pentium 4 with 512M RAM, cross-compiling the compiler
using a non-natively built MLton (to be fair to the SPARC) takes about
24 minutes, of which the first 6 minutes is generation of C and the
remaining 18 is the C compile.
In summary, here are the times in minutes.
gen C compile C
------ ---------
compiling the stubs 20 70
compiling with stubs 450 60
fixpoint compile 17 60
cross compile from P4 6 18
So, it looks like the SPARC is about 3 times slower than the P4. And
that bootstrapping on the SPARC is prohibitively slow. My packaging
script will be based on cross compiles.
The text+data size of the fixpoint compiler is 10,262,388.
Now for all the usual benchmarks, comparing SML/NJ 110.42 and MLton on
the SPARC. The numbers are below. The runtime ratios are a lot
better than I would have expected given that this is the nonnative
backend and there hasn't been any tuning. The only benchmarks where
MLton is worse are barnes-hut, fft, fib, life, ray, tailfib, and zern.
On the P4, MLton native is worse with barnes-hut, logic, nucleic, and
tyan. So the only overlap is barnes-hut. We're still using the
simple C code to handle all possibly-misaligned memory accesses for
doubles, so that's probably hurting some. It's surprising it doesn't
hurt more (unless of course SML/NJ does the same thing).
Glancing at the raw running times compared with a P4, we see that the
times are roughly 3X-5X slower with checksum, md5 and
smith-normal-form notably worse.
There are a few benchmarks that SML/NJ fails to compile: DLXSimulator,
nucleic, and tensor. DLXSimulator is due to an internal bug in SML/NJ
(uncaught exception RecoverLty). Nucleic was killed due to excessive
paging -- I let it compile for over 12 hours and reach 769M before I
killed it. Tensor fails to compile due to a type error. I see that
the program assumes that Array.appi has spec
val appi: (int * 'a -> unit) -> 'a array -> unit
which is the 2002 spec, which MLton agrees with. Unfortunately, all
the other compilers still seem to support the 1997 spec, with
val appi : (int * 'a -> unit) -> 'a array * int * int option -> unit
So, it is correct that SML/NJ fails, although the benchmark program
really should put a line for tensor in the run time ratio table.
I killed smith-normal-form with SML/NJ for the usual reason: it runs
too slowly.
One benchmark, vliw, failed to run with MLton. I've tried it since
and it works fine. I am investigating.
run time ratio
benchmark SML/NJ
barnes-hut 0.71
boyer 2.93
checksum 3.60
count-graphs 2.46
fft 0.94
fib 0.80
hamlet 2.25
imp-for 13.11
knuth-bendix 4.05
lexgen 1.80
life 0.76
logic 1.38
mandelbrot 1.16
matrix-multiply 5.03
md5 8.65
merge 3.09
mlyacc 1.69
model-elimination 2.21
mpuz 3.18
peek 11.01
psdes-random 5.32
ratio-regions 7.66
ray 0.71
raytrace 1.64
simple 1.19
tailfib 0.80
tak 2.04
tsp 2.39
tyan 1.01
vector-concat 13.47
vector-rev 51.29
wc-input1 25.75
wc-scanStream 10.52
zebra 9.43
zern 0.64
compile time
benchmark MLton0 SML/NJ
barnes-hut 14.25 4.64
boyer 63.05 13.15
checksum 3.09 0.79
count-graphs 9.10 2.80
DLXSimulator 22.11 *
fft 6.81 2.35
fib 2.85 0.75
hamlet 409.34 184.83
imp-for 3.03 0.79
knuth-bendix 20.56 4.98
lexgen 37.32 11.41
life 8.87 2.20
logic 19.98 5.23
mandelbrot 3.19 0.94
matrix-multiply 3.38 1.09
md5 6.01 3.15
merge 6.27 0.81
mlyacc 173.35 61.75
model-elimination 168.05 100.53
mpuz 4.57 1.46
nucleic 109.93 *
peek 5.25 0.86
psdes-random 3.22 1.01
ratio-regions 13.57 4.84
ray 22.86 3.09
raytrace 70.83 17.47
simple 62.49 11.12
smith-normal-form 255.41 13.99
tailfib 2.85 0.71
tak 2.90 0.71
tensor 15.07 *
tsp 8.17 2.06
tyan 23.92 7.56
vector-concat 3.33 0.77
vector-rev 3.05 0.79
vliw 97.39 45.02
wc-input1 8.56 0.85
wc-scanStream 8.71 0.89
zebra 25.66 2.18
zern 5.36 2.09
run time
benchmark MLton0 SML/NJ
barnes-hut 207.21 147.05
boyer 157.37 461.19
checksum 447.13 1608.18
count-graphs 192.84 473.69
DLXSimulator 197.18 *
fft 125.41 117.44
fib 220.12 177.02
hamlet 221.12 498.51
imp-for 102.83 1347.82
knuth-bendix 169.64 686.27
lexgen 197.37 354.51
life 266.13 202.97
logic 188.26 259.00
mandelbrot 179.51 208.91
matrix-multiply 263.59 1324.55
md5 906.38 7843.23
merge 217.45 672.13
mlyacc 164.42 277.58
model-elimination 321.04 708.03
mpuz 187.72 597.09
nucleic 265.22 *
peek 140.25 1544.40
psdes-random 135.71 722.52
ratio-regions 133.23 1020.24
ray 143.04 102.11
raytrace 243.68 399.93
simple 297.28 353.90
smith-normal-form 545.43 *
tailfib 186.47 149.94
tak 324.48 663.17
tensor 143.39 *
tsp 193.34 461.72
tyan 201.03 203.10
vector-concat 245.54 3306.75
vector-rev 219.93 11281.07
wc-input1 132.57 3413.07
wc-scanStream 158.84 1671.39
zebra 283.46 2671.89
zern 269.14 171.74
size
benchmark MLton0 SML/NJ
barnes-hut 152,032 350,196
boyer 171,290 432,092
checksum 57,423 381,656
count-graphs 77,767 370,732
DLXSimulator 122,758 *
fft 68,051 359,444
fib 57,375 312,284
hamlet 1,365,132 1,317,060
imp-for 57,207 345,784
knuth-bendix 105,774 350,172
lexgen 185,083 413,684
life 79,967 330,716
logic 111,890 354,268
mandelbrot 57,279 321,500
matrix-multiply 57,935 356,332
md5 67,899 343,036
merge 59,007 345,792
mlyacc 514,263 714,788
model-elimination 741,392 901,252
mpuz 63,775 325,596
nucleic 206,260 *
peek 65,923 316,428
psdes-random 58,047 322,524
ratio-regions 83,695 356,332
ray 129,001 411,716
raytrace 267,834 528,484
simple 257,210 679,980
smith-normal-form 283,276 554,052
tailfib 57,111 344,760
tak 57,439 336,568
tensor 146,083 *
tsp 75,612 342,004
tyan 130,718 395,276
vector-concat 58,719 353,992
vector-rev 57,823 353,992
vliw 393,099 657,484
wc-input1 80,275 345,784
wc-scanStream 81,243 346,808
zebra 146,937 335,852
zern 64,773 365,604
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel