Team PLClub ICFP entry -- comparing the performance of OCAML and SML
Zhong Shao
shao-zhong@cs.yale.edu
Fri, 13 Oct 2000 14:46:26 -0400
|> So, you are correct for Real.Math.sqrt -- the SML/NJ
|> implementation is about 10 times slower than the C math library. I
|> then computed a rough estimate for the number of seconds spent by
|> SML/NJ in sqrt for each benchmark by taking the number of sqrts
|> divided by SML/NJ's rate for sqrts. Here are the numbers for each
|> benchmark, along with the original running time measurements.
|>
|> number of number SML/NJ Total running time
|> calls to of sqrt
|> Real.Math sqrts time OCAML MLton SML/NJ
|> ----------- --------- -------- ----- ----- ----- -----
|> holes 252374 252359 0.6 1.8 3.2 3.9
|> fov 416558 416545 0.9 1.5 1.8 3.2
|> intercyl 789548 735426 1.6 1.6 2.1 4.3
|> snowgoon 470715 362464 0.8 2.9 3.3 5.1
|> dice 1263757 925063 2.0 3.9 4.9 8.8
|> golf 193489 130088 0.3 1.5 2.4 3.1
|> cone-fractal 603638 455931 1.0 3.7 4.3 6.5
|> large 73500 73487 0.6 4.3 3.0 6.7
|> pipe 1087559 983314 2.2 5.4 4.6 7.9
|> chess 1192926 1030668 2.3 16.0 15.5 21.6
|> fractal 2296066 2229266 4.9 12.2 8.5 45.4
|>
|> From this table, I conclude that the performance penalty for using
|> sqrt in SML/NJ is a significant cause for why SML/NJ is slower than
|> OCAML or MLton, but not the only cause.
|>
Thanks for all the experiments. But I would not make any conclusion
this quickly. For example, I am surprised that SML/NJ only spent
10% of its time executing sqrt on chess and fractal if "sqrt" is
indeed a hot spot.
SML/NJ also implements many of its Real primitives in ML itself
(e.g., floor, trunc, round, cell, ...), so they may also contribute
to the slowdown.
|> As to my statement "MLton did not penalize the programming style as
|> much as SML/NJ did", I apologize for not being more clear. I was
|> referring to the table in my post that showed the speedup John
|> Reppy achieved by changing the programming style.
|> ....................
|>
But John only made three modifications: using unsafe array operations,
uncurrying, and replacing loop index reference cells by tail recursive
functions. Since (in your previous message) you didn't think unsafe
array operation is the problem, I assume you were talking about the
other two aspects (in which presumably MLton does a better job
supporting them than SML/NJ).
|> > I guess you were trying to say that MLton does a better uncurrying
|> > optimization and supports "reference cell" more efficiently than SML/NJ.
|>
|> I was not. In fact, MLton does not at present have an explicit
|> uncurrying optimization, although the combination of some of its
|> other optimizations has the effect of some uncurrying. There are
|> some known cases where MLton has a performance penalty for using
|> curried functions. The MLton developers are trying to eliminate
|> them. As to reference cells, I did not mention them in my post.
See above.
|>
|> > But perhaps this has more to do with the fact that MLton does
|> > more aggressive whole-program analysis while SML/NJ emphasize
|> > more on the scalability.
|>
|> My conclusion is that some of the performance difference between
|> SML/NJ and OCAML or MLton is due to sqrt and some is unexplained.
|> As to the unexplained part, it may be due to whole program analysis
|> or may be due to more mundane things like register allocation or
|> peephole optimization. I don't know. If I had to guess, I would
|> say it is not due to whole-program analysis, since OCAML achieves
|> very good performance with separate compilation.
|>
If I had to guess, I still think it is mostly due to the fact that
Ocaml/MLton try to do more work in native C while SML/NJ tries to do
more in ML itself.
I don't believe the unexplained part has anything to do with whole
program analysis, register allocation, or peephole optimization.
The new SML/NJ backend does a pretty good job of all these and a
more sophisticated optimizer can seldom yield more than 10% speedup.
-Zhong Shao
(shao-zhong@cs.yale.edu)