[MLton-devel] nucleic benchmark times
Stephen Weeks
MLton@mlton.org
Fri, 15 Nov 2002 14:21:47 -0800
Here is my analysis of the numbers Matthew sent. I'll consider the
following three compilation options (with gcc 3.2, but with or without
-O2, since it rarely mattered) relative to -native true.
A -native false -detect-overflow true
B -native false -detect-overflow true -DFAST_INT
C -native false -detect-overflow false
Compiling with A is much worse in performance than the native
backend, often by a factor of more than 2, and sometimes by factors
greater than 5 or 10.
Compiling with B is worse than the native backend, but never by more
than a factor of 2.
Compiling with C is pretty close to compiling with the native backend.
It is often worse by a little (only more than 1.4 on three of the
benchmarks) and sometimes better.
Note: compiling with A follows the language/library specification,
while compiling with B or C does not.
There are three ways the performance of A might improve: MLton
improvements, C codegen, and gcc improvements. As to the MLton
improvements, a perfect overflow detection elimination pass might make
compiling with A close to compiling with C. Of course even with a
perfect analysis, some overflow tests might remain due to unknown
inputs. In any case, I am doubtful that it is possible to get enough
to consistently close the gap. As to C codegen or gcc improvements, I
think there is room for improvement there, but not enough to make up
for the often large factors.
On the other hand, B and C are already close to -native true. With
some overflow detection elimination and with C codegen/gcc
improvements, I could see them consistently doing as well as the
native codegen, and sometimes beating it by large margins. The
differences between gcc 2.95 and 3.2 (along with the -mcpu=i686
option) are pretty stunning.
One thing I don't like about the C numbers is that they compare to the
native backend with -detect-overflow true. I think it would be fairer
to compare with -native true -detect-overflow false. To get a feel
for the difference, here are the benchmarks -detect-overflow
{false,true}.
MLton0 -- mlton -detect-overflow true
MLton1 -- mlton -detect-overflow false
run time ratio
benchmark MLton1
barnes-hut 0.71
boyer 1.23
checksum 0.97
count-graphs 0.99
DLXSimulator 1.00
fft 1.07
fib 1.00
hamlet 0.98
imp-for 1.16
knuth-bendix 1.05
lexgen 1.04
life 0.91
logic 1.02
mandelbrot 0.84
matrix-multiply 0.91
md5 0.95
merge 0.99
mlyacc 0.95
model-elimination 1.00
mpuz 0.93
nucleic 0.96
peek 1.33
psdes-random 1.03
ratio-regions 1.00
ray 0.99
raytrace 1.00
simple 1.02
smith-normal-form 1.00
tailfib 0.81
tak 0.92
tensor 0.89
tsp 1.00
tyan 0.99
vector-concat 1.00
vector-rev 1.03
vliw 1.10
wc-input1 0.99
wc-scanStream 0.98
zebra 1.00
zern 1.03
size
benchmark MLton0 MLton1
barnes-hut 103,008 102,120
boyer 140,471 140,431
checksum 43,735 43,647
count-graphs 63,759 63,271
DLXSimulator 101,312 99,152
fft 52,763 52,283
fib 43,807 43,615
hamlet 1,226,928 1,210,560
imp-for 43,711 43,567
knuth-bendix 86,224 84,312
lexgen 171,741 170,493
life 62,175 61,847
logic 103,815 103,479
mandelbrot 43,791 43,495
matrix-multiply 44,295 44,183
md5 52,840 51,752
merge 45,063 44,959
mlyacc 534,573 529,181
model-elimination 633,184 629,024
mpuz 47,911 47,671
nucleic 190,959 190,983
peek 51,864 51,512
psdes-random 44,895 44,847
ratio-regions 62,167 63,303
ray 103,280 102,528
raytrace 277,757 276,029
simple 199,419 195,307
smith-normal-form 181,044 180,068
tailfib 43,519 43,375
tak 43,919 43,527
tensor 103,659 101,611
tsp 58,832 57,968
tyan 106,752 105,552
vector-concat 44,255 44,215
vector-rev 44,095 44,047
vliw 322,425 318,297
wc-input1 65,773 65,205
wc-scanStream 66,253 65,653
zebra 142,376 141,184
zern 50,482 50,322
So turning off overflow doesn't help the native backend all that much,
but does show that some of the benefit in going from B to C was
turning off overflow.
Here's what I think the interesting questions are.
1. Can we modify B to correctly implement SML semantics? From gcc's
perspective, compiling with B, which has a jo to an error after each
arithmetic instruction, should produce identical performance to a
hypothetical variant that has a jo to a handler, if only we could tell
gcc where to go. I know that we've thought about how to do this
several times before, but in light of the numbers, I think it's worth
thinking about again. Maybe we could jo somewhere that looks at the C
stack + some globals that we stash and figures out where to go?
2. Can we provide gcc aliasing information? I ask this because I
doubt that gcc is figuring out much, if any, of what we know. If so,
and it is already doing so well, imagine how much more improvement
there is to be had by combining gcc's optimizer with our aliasing
info.
3. Can we hook into a gcc machine-independent IL? A yes here would
presumably help with (1) and (2), give us good performance, and help a
lot with portability (and performance there too). Although messy, I
view tying ourselves to gcc's IL a much more palatable alternative
than C--, MLRISC, ....
In short, it seems to me that looking into gcc and its ILs has a very
high potential upside for MLton.
-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing
your web site with SSL, click here to get a FREE TRIAL of a Thawte
Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel