[MLton-devel] nucleic benchmark times

Fri, 15 Nov 2002 14:21:47 -0800

Here is my analysis of the numbers Matthew sent.  I'll consider the
following three compilation options (with gcc 3.2, but with or without
-O2, since it rarely mattered) relative to -native true.

A 	-native false -detect-overflow true
B	-native false -detect-overflow true -DFAST_INT
C 	-native false -detect-overflow false

Compiling with A is much worse in performance than the native
backend, often by a factor of more than 2, and sometimes by factors
greater than 5 or 10.

Compiling with B is worse than the native backend, but never by more
than a factor of 2.

Compiling with C is pretty close to compiling with the native backend.
It is often worse by a little (only more than 1.4 on three of the
benchmarks) and sometimes better.

Note: compiling with A follows the language/library specification,
while compiling with B or C does not.

There are three ways the performance of A might improve: MLton
improvements, C codegen, and gcc improvements.  As to the MLton
improvements, a perfect overflow detection elimination pass might make
compiling with A close to compiling with C.  Of course even with a
perfect analysis, some overflow tests might remain due to unknown
inputs.  In any case, I am doubtful that it is possible to get enough
to consistently close the gap.  As to C codegen or gcc improvements, I
think there is room for improvement there, but not enough to make up
for the often large factors.

On the other hand, B and C are already close to -native true.  With
some overflow detection elimination and with C codegen/gcc
improvements, I could see them consistently doing as well as the
native codegen, and sometimes beating it by large margins.  The
differences between gcc 2.95 and 3.2 (along with the -mcpu=i686
option) are pretty stunning.

One thing I don't like about the C numbers is that they compare to the
native backend with -detect-overflow true.  I think it would be fairer
to compare with -native true -detect-overflow false.  To get a feel
for the difference, here are the benchmarks -detect-overflow
{false,true}.

MLton0 -- mlton -detect-overflow true
MLton1 -- mlton -detect-overflow false

run time ratio
benchmark         MLton1
barnes-hut          0.71
boyer               1.23
checksum            0.97
count-graphs        0.99
DLXSimulator        1.00
fft                 1.07
fib                 1.00
hamlet              0.98
imp-for             1.16
knuth-bendix        1.05
lexgen              1.04
life                0.91
logic               1.02
mandelbrot          0.84
matrix-multiply     0.91
md5                 0.95
merge               0.99
mlyacc              0.95
model-elimination   1.00
mpuz                0.93
nucleic             0.96
peek                1.33
psdes-random        1.03
ratio-regions       1.00
ray                 0.99
raytrace            1.00
simple              1.02
smith-normal-form   1.00
tailfib             0.81
tak                 0.92
tensor              0.89
tsp                 1.00
tyan                0.99
vector-concat       1.00
vector-rev          1.03
vliw                1.10
wc-input1           0.99
wc-scanStream       0.98
zebra               1.00
zern                1.03

size
benchmark            MLton0    MLton1
barnes-hut          103,008   102,120
boyer               140,471   140,431
checksum             43,735    43,647
count-graphs         63,759    63,271
DLXSimulator        101,312    99,152
fft                  52,763    52,283
fib                  43,807    43,615
hamlet            1,226,928 1,210,560
imp-for              43,711    43,567
knuth-bendix         86,224    84,312
lexgen              171,741   170,493
life                 62,175    61,847
logic               103,815   103,479
mandelbrot           43,791    43,495
matrix-multiply      44,295    44,183
md5                  52,840    51,752
merge                45,063    44,959
mlyacc              534,573   529,181
model-elimination   633,184   629,024
mpuz                 47,911    47,671
nucleic             190,959   190,983
peek                 51,864    51,512
psdes-random         44,895    44,847
ratio-regions        62,167    63,303
ray                 103,280   102,528
raytrace            277,757   276,029
simple              199,419   195,307
smith-normal-form   181,044   180,068
tailfib              43,519    43,375
tak                  43,919    43,527
tensor              103,659   101,611
tsp                  58,832    57,968
tyan                106,752   105,552
vector-concat        44,255    44,215
vector-rev           44,095    44,047
vliw                322,425   318,297
wc-input1            65,773    65,205
wc-scanStream        66,253    65,653
zebra               142,376   141,184
zern                 50,482    50,322

So turning off overflow doesn't help the native backend all that much,
but does show that some of the benefit in going from B to C was
turning off overflow.

Here's what I think the interesting questions are.

1. Can we modify B to correctly implement SML semantics?  From gcc's
perspective, compiling with B, which has a jo to an error after each
arithmetic instruction, should produce identical performance to a
hypothetical variant that has a jo to a handler, if only we could tell
gcc where to go.  I know that we've thought about how to do this
several times before, but in light of the numbers, I think it's worth
thinking about again.  Maybe we could jo somewhere that looks at the C
stack + some globals that we stash and figures out where to go?

2. Can we provide gcc aliasing information?  I ask this because I
doubt that gcc is figuring out much, if any, of what we know.  If so,
and it is already doing so well, imagine how much more improvement
there is to be had by combining gcc's optimizer with our aliasing
info.

3. Can we hook into a gcc machine-independent IL?  A yes here would
presumably help with (1) and (2), give us good performance, and help a
lot with portability (and performance there too).  Although messy, I
view tying ourselves to gcc's IL a much more palatable alternative
than C--, MLRISC, ....

In short, it seems to me that looking into gcc and its ILs has a very
high potential upside for MLton.

-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing 
your web site with SSL, click here to get a FREE TRIAL of a Thawte 
Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
MLton-devel mailing list
MLton-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mlton-devel