[MLton] Number theory and MLton C FFI

Wesley W. Terpstra terpstra@gkec.tu-darmstadt.de
Fri, 22 Oct 2004 04:09:22 +0200


On Thu, Oct 21, 2004 at 05:03:32PM -0700, Stephen Weeks wrote:
> For fun, I took a look at what MLton generated for your add routine.
> 
> 	movl (xxx+(0*4)),%esp
> 	addl (yyy+(0*4)),%esp
> 	movl %esp,%esi
> 	andl $0x7FFFFFFF,%esi
> 	shrl $0x1F,%esp
> 	addl %esi,%esp

Not bad at all!

> 	call WordU32_toWord64
> 	call WordU32_toWord64
> 	call WordU64_rshift
> 	call WordU64_toWord32
> 	call WordU64_toWord32
> 	shrl $0x1,%eax
> 	addl %ebx,%eax
> 	movl %eax,%esi
> 	andl $0x7FFFFFFF,%esi
> 	shrl $0x1F,%eax
> 	addl %esi,%eax

... where's the multiply?
I agree the last part is not bad.

> For the record, here's the SML code I used.

Thank you; would you mind if I copied it?
(Your implementation is nicer than mine)

> > Real hardware does this in one instruction.
> > Promoting them to Word64s and then multiplying does not do that.
> 
> MLton does not do that right now, since it will do a function call for
> each Word32->Word64 conversion and for the Word64 mul.  But, as Henry
> says, it seems quite plausible that MLton will do the right thing some
> day.  This is a straightforward local optimization.  In any case, my
> main point is that this is not a limitation of SML, just of MLton.

If MLton knew this optimization, it might mean I wouldn't need C.
It wouldn't be too hard to target SML with my code generator.

One of the big advantages of using SML is that I could also make the code
use roots of unity that are computed automatically based on the field
construction. That would be really great as then they code could be reused!

C++ can compute these things at compile time (using templates), but C not.
Actually, how far will the MLton compiler go?

fun first_factor x =
  let
    open Word64
    fun helper i =
      if (x mod i) = 0w0 then i else
      helper (i+0w1)
  in
    helper 0w2
  end ;

val x = first_factor 0w4611686014132420609 ;
print (Word64.toString x ^ "\n");;

Shouldn't x be computed at compile-time? (a bit extreme perhaps ;)
I see that for my MLton compiler it is not.

> I wonder if your C routines are expensive enough that marshalling is a
> viable way to communicate between SML and C.

Possibly.

Another thing that matters for fast number theory functions is: SIMD.
I imagine automatic use of SIMD in MLton is a long ways off, but a good
programmer can make intrinsics go a long way.

Maybe I should try a source dive into MLton.
If there was an SSE2 : SIMD, MMX : SIMD, AltiVec : SIMD and that 64bit 
mul optimization, I could perhaps achieve everything I need in SML.

For now C is the most expedient route, for one since it already works...

Also, so far as I can tell, I can't write a library under MLton and expect
people to be able to use it from C without forcing MLton on them. Are there
any plans for MLton's whole-program optimization to use a different entry
point than main? Being able to list a few method to export as C functions
would be enough to build some useful libraries under MLton.

-- 
Wesley W. Terpstra