[MLton-user] more optimization questions
Matthew Fluet
fluet@cs.cornell.edu
Sun, 20 Nov 2005 18:29:03 -0500 (EST)
> I coded up a simple but very useful 2D finite difference code. I did it in
> 2D to eliminate my 3D array implementation from the equation. It makes for a
> very nice test case. The code is quite simple. It has a "correct answer" to
> test for correctness of operation. It scales easily, i.e. you can simply
> increase the size of the arrays to make it take longer and the answer remains
> the same (although the iteration count goes up).
>
> The results are somewhat depressing:
>
> gcc -O2
> real 0m4.001s
> user 0m3.908s
> sys 0m0.028s
>
>
> mlton (-cc-opt -O2)
> real 0m14.784s
> user 0m14.664s
> sys 0m0.058s
MLton is still probably doing a lot of overflow checks and bounds checks.
It would be interesting to see the effect of -detect-overflow false.
Which, by the way, is a good thing. I know everyone means well, but it
isn't (always) a meaningful comparison to transliterate a C program into
SML and expect the same performance out of MLton as out of GCC. I'd like
to see someone transliterate Henry's count-graphs benchmark, which makes
heavy use of higher order function and exceptions, into C and report back
on mlton's vs gcc's performance.
Another thread along these lines starts here:
http://mlton.org/pipermail/mlton/2005-March/026874.html
>>> As for power-pc optimization, I'm really interested in helping with that.
>>> Although with the mac bonehead decision to go to intel I can't see that
>>> anyone is going to be very motivated to optimize anyting for power pc.
>>
>> Well, since a native code power-pc backend is unlikely, any improvement to
>> the C-codegen would benefit other platforms as well.
>
> Given that the C-compiler performance is quite good on the power-pc that
> would probably help a lot. I'm definitely willing to invest in the time to
> help increase the performance. It would save me the effort of writing my own
> compiler for a numerical computation oriented functional language (SISAL
> anyone ?) ;-)
I seem to recall that at one point in time, we had inline assembly for
overflow checking arithmetic in the (support code for the) C-codgen. When
we had the native x86-codegen, we simplified that away, but it might be
worthwhile to see what inline PowerPC assembly for overflow checking
arithmetic gives you.
> Also I'm just plain curious as to what is going on. It's not obvious to me
> that any of the optimizations being discussed are worth a factor of 3.5 in
> performance, are they ?
It's hard to say. There is an additional issue that, to GCC, all the
C-code that MLton produces looks as though it is doing a lot of heap reads
and writes, since MLton puts the ML stack on the heap. This means that
GCC is probably being a bit conservative in it's alias analysis, and won't
be able to do any of the loop optimizations for us.