[MLton] Number theory and MLton C FFI

Stephen Weeks MLton@mlton.org
Thu, 21 Oct 2004 17:03:32 -0700

> However, I don't think you can get those above methods to compile to nearly
> as nice code as
> ADD:
>         addl    %esi, %edi
>         movl    %edi, %eax
>         shrl    $31, %edi
>         andl    $2147483647, %eax
>         addl    %edi, %eax

For fun, I took a look at what MLton generated for your add routine.

	movl (xxx+(0*4)),%esp
	addl (yyy+(0*4)),%esp
	movl %esp,%esi
	andl $0x7FFFFFFF,%esi
	shrl $0x1F,%esp
	addl %esi,%esp

(xxx and yyy are a couple of _imported constants)

The code is pretty comparable to what gcc gives.  As to MUL, MLton
generates awful code involving function calls, because our native
codegen doesn't support 64 bit entities very well.  But, I don't think
there's any fundamental problem there -- we just need the motivation
to put some more work into the codegen (perhaps this thread will help

> MUL:
>         lea       (%ecx,%ecx), %edx                             #17.34
>         mull      %edx                                          #17.34
>         shrl      $1, %eax                                      #18.41
>         addl      %eax, %edx                                    #18.16
>         movl      %edx, %eax                                    #18.16
>         andl      $2147483647, %eax                             #18.16
>         shrl      $31, %edx                                     #18.16
>         addl      %edx, %eax                                    #18.16

Heres what MLton generates

	call WordU32_toWord64
	call WordU32_toWord64
	call WordU64_rshift
	call WordU64_toWord32
	call WordU64_toWord32
	shrl $0x1,%eax
	addl %ebx,%eax
	movl %eax,%esi
	andl $0x7FFFFFFF,%esi
	shrl $0x1F,%eax
	addl %esi,%eax

So, once MLton gets through the stuff that we're still implementing
via FFI, it does the right thing.

For the record, here's the SML code I used.

type Z = Word32.word
type ZZ = Word64.word

val M_BITS = 0w31
val R_BITS = Word.fromInt Word32.wordSize
val M_MASK: Z = Word.<< (0w1, M_BITS) - 0w1

fun add (x: Z, y: Z) =
      open Word32
      val z = x + y
      andb (z, M_MASK) + >> (z, M_BITS)

fun mul (x: Z, y: Z) =
      val ZZ = Word32.toLarge
      val Z = Word32.fromLarge
      val z: ZZ = ZZ x * ZZ (Word32.<< (y, R_BITS - M_BITS))
      add (Z (Word64.>> (z, R_BITS)), Word32.>> (Z z, R_BITS - M_BITS))

val x = _import "xxx": Z;
val y = _import "yyy": Z;

> However, one _big_ problem I have is the inability of SML to capture a
> product in a larger integer. So Word32*Word32 _SHOULD_ give Word64...
> but it doesn't. This realllly sucks.
> Real hardware does this in one instruction.
> Promoting them to Word64s and then multiplying does not do that.

MLton does not do that right now, since it will do a function call for
each Word32->Word64 conversion and for the Word64 mul.  But, as Henry
says, it seems quite plausible that MLton will do the right thing some
day.  This is a straightforward local optimization.  In any case, my
main point is that this is not a limitation of SML, just of MLton.

> The methods we are talking about here take a LOT longer than a method call.
> They are methods like:
> 	perform any FFT over this polynomial (like 32MB)
> 	Multiply a[i] by c^i for all a[i] in the polynomial
> 	Multiply a[i]*b[i] and return the resulting vector
> 	Compute the polynomial with roots at these 3M points.
> ... you get the idea. =)

I wonder if your C routines are expensive enough that marshalling is a
viable way to communicate between SML and C.

> PS. Your fibonacci.sml file is unfair compared to ocaml:

I took a look and you're right -- in fact, it is the OCaml entry that
is incorrect.  It doesn't follow the test spec, which requires
recursive computation of the fibonacci numbers.  No wonder it is ten
times faster than other languages.  Brent, can you disable that test
until someone fixes it?

> Actually, a lot of your tests compare ocaml and mlton unfairly...

Please let us know of others.