SML numerical benchmark and MLton
Juan Jose Garcia Ripoll
jjgarcia@ind-cr.uclm.es
26 Oct 1999 10:22:13 +0200
"Stephen Weeks" <sweeks@intertrust.com> writes:
> > Secondly, I must admit that none of the previous optimizations did no better
> > to the performance of the MLton compiled code.
>
> To be clear, I was interested in a comparison of
> * your original code compiled by MLton
> * your semi-automatically optimized code compiled by MLton
I have improved the timer. Now it uses 'times' from the Posix.ProcEnv
structure, just like the reference C code does. These are the results:
(* With hand-coded optimizations *)
$ ./tests
Real tensors: (+, *, /, +*, *+)
100 1 2 1 50 67
200 8 7 8 632 607
300 17 17 19 2334 2715
Real tensors: (+, *, /, +*, *+)
100 4 4 4 77 122
200 19 22 22 1325 1437
300 48 46 51 4437 5730
(* Left to the compiler. Extensive use of functors *)
$ ../smlapl3/tests
Real tensors: (+, *, /, +*, *+)
100 2 2 1 72 60
200 8 6 9 482 490
300 16 15 20 3225 2399
Complex tensors: (+, *, /, +*, *+)
100 4 4 4 199 162
200 19 20 23 2035 1902
300 46 49 52 8047 7735
(* Hand-coded optimizations. SML/NJ 110.17 *)
Real tensors: (+, *, /, +*, *+)
100 2 2 1 80 95
200 10 9 10 639 845
300 23 21 21 3022 3527
Real tensors: (+, *, /, +*, *+)
100 4 7 6 160 199
200 26 27 32 1679 2130
300 60 71 75 6627 7820
(* Reference C code *)
$ yorick -batch tests.i
Real tensors (add, mult, div, +*, *+)
100 0 0 0 25 22
200 3 4 5 227 232
300 8 8 13 1225 1182
Complex tensors (add, mult, div, +*, *+)
100 1 1 4 62 87
200 7 11 19 789 1212
300 17 25 42 3059 4819
I post all benchmarks because I have used a newer machine to test
these procedures. Also, now that I use 'times', the difference between
one and multiple passes vanishes.
You can find a version which uses no #inline tags at this address:
http://est202.sub37.uclm.es/jjgarcia/smlapl-noinline.tgz
> On a related note, I was curious about your comments about the speed
> of Ocaml. I was also interested if you had any code where you had
> both an SML and an Ocaml version. I recently did some benchmarking of
> Ocaml and it did quite well on a few small benchmarks, and I am
> looking for more code to try. Thanks.
There was a serious problem with the Ocaml version which forced me to
abandon it right at the development stage, and it is that something as
simple as this (I'll use SML notation):
fun a + b = MonoTensor.map2 RNumber.+ a b
will always involve a call to RNumber.+, no matter the limit you feed
into the inline option. In other words, ocamlopt does not know how to
inline the arguments of higher order functions and that means I should
code everything by hand to get reasonable speed and to avoid
consing. There is also the problem the complex numbers as parameters
to functions always cons, and the severe limit on the size of arrays
for the x86 architecture. And finally, the interpreted code cannot use
the native-compiled code and so it is not valid as interactive
environment.
Juanjo
-----
Universidad de Castilla-La Mancha
Departamento de Matematicas
ETSI Industriales
c/Camilo Jose Cela, 3, Phone: +34-926-295300 (ext 3085)
Ciudad Real, E-13071 (Spain) Fax: +34-926-295369
-----
Our group: http://www.uclm.es/dep/matematicas/nolineal/index.html
Temporal page: http://est202.sub37.uclm.es/jjgarcia/index.html