[MLton-user] more optimization questions

Sun, 20 Nov 2005 10:06:27 -0800

On Nov 19, 2005, at 1:27 PM, Matthew Fluet wrote:

>
> I can't speak for gambit-c, but MLton hasn't put any special effort  
> into loop or floating-point operations.  In fact, we know we're  
> missing some possible loop optimizations.  Also, since the GC  
> doesn't handle pointers to object interiors, array accesses are  
> always computed addresses, rather than striding a pointer.  You  
> might also lose somewhat because MLton won't/can't common-subexp  
> eliminate array reads, since the array is a mutable object.   
> (Again, this is a place where more sophisticated analyses might  
> admit such elimination.)

I coded up a simple but very useful 2D finite difference code.  I did  
it in 2D to eliminate my 3D array implementation from the equation.   
It makes for a very nice test case.  The code is quite simple.  It  
has a "correct answer" to test for correctness of operation.  It  
scales easily, i.e. you can simply increase the size of the arrays to  
make it take longer and the answer remains the same (although the  
iteration count goes up).

The results are somewhat depressing:

gcc -O2
real    0m4.001s
user    0m3.908s
sys     0m0.028s

mlton (-cc-opt -O2)
real    0m14.784s
user    0m14.664s
sys     0m0.058s

ouch!

Also I'll probably port this over to gambit-C and see what happens.

>> As for power-pc optimization, I'm really interested in helping  
>> with that. Although with the mac bonehead decision to go to intel  
>> I can't see that anyone is going to be very motivated to optimize  
>> anyting for power pc.
>
> Well, since a native code power-pc backend is unlikely, any  
> improvement to the C-codegen would benefit other platforms as well.

Given that the C-compiler performance is quite good on the power-pc  
that would probably help a lot.  I'm definitely willing to invest in  
the time to help increase the performance.  It would save me the  
effort of writing my own compiler for a numerical computation  
oriented functional language (SISAL anyone ?) ;-)

Also I'm just plain curious as to what is going on.  It's not obvious  
to me that any of the optimizations being discussed are worth a  
factor of 3.5 in performance, are they ?

>
>> I'll code up some simple examples to see if I can understand the  
>> intermediate files a little better.  Also smaller examples might  
>> help me narrow in on what is going on.  There must be something  
>> going on - the difference is just too big.
>
> You're welcome to post code (to http://www.mlton.org/TemporaryUpload).

I've uploaded the 2D FD code.  I think it's very useful for  
investigations in this area - so developer types might find it  
helpful.  I'm also going to turn on the windows machine and give the  
x86 a try and see what happens.

Thanks
Brian