[MLton] Number theory and MLton C FFI
Stephen Weeks
MLton@mlton.org
Thu, 21 Oct 2004 17:03:32 -0700
> However, I don't think you can get those above methods to compile to nearly
> as nice code as
>
> ADD:
> addl %esi, %edi
> movl %edi, %eax
> shrl $31, %edi
> andl $2147483647, %eax
> addl %edi, %eax
For fun, I took a look at what MLton generated for your add routine.
movl (xxx+(0*4)),%esp
addl (yyy+(0*4)),%esp
movl %esp,%esi
andl $0x7FFFFFFF,%esi
shrl $0x1F,%esp
addl %esi,%esp
(xxx and yyy are a couple of _imported constants)
The code is pretty comparable to what gcc gives. As to MUL, MLton
generates awful code involving function calls, because our native
codegen doesn't support 64 bit entities very well. But, I don't think
there's any fundamental problem there -- we just need the motivation
to put some more work into the codegen (perhaps this thread will help
:-).
> MUL:
> lea (%ecx,%ecx), %edx #17.34
> mull %edx #17.34
> shrl $1, %eax #18.41
> addl %eax, %edx #18.16
> movl %edx, %eax #18.16
> andl $2147483647, %eax #18.16
> shrl $31, %edx #18.16
> addl %edx, %eax #18.16
Heres what MLton generates
...
call WordU32_toWord64
...
call WordU32_toWord64
...
call WordU64_rshift
...
call WordU64_toWord32
...
call WordU64_toWord32
...
shrl $0x1,%eax
addl %ebx,%eax
movl %eax,%esi
andl $0x7FFFFFFF,%esi
shrl $0x1F,%eax
addl %esi,%eax
So, once MLton gets through the stuff that we're still implementing
via FFI, it does the right thing.
For the record, here's the SML code I used.
------------------------------------------------------------
type Z = Word32.word
type ZZ = Word64.word
val M_BITS = 0w31
val R_BITS = Word.fromInt Word32.wordSize
val M_MASK: Z = Word.<< (0w1, M_BITS) - 0w1
fun add (x: Z, y: Z) =
let
open Word32
val z = x + y
in
andb (z, M_MASK) + >> (z, M_BITS)
end
fun mul (x: Z, y: Z) =
let
val ZZ = Word32.toLarge
val Z = Word32.fromLarge
val z: ZZ = ZZ x * ZZ (Word32.<< (y, R_BITS - M_BITS))
in
add (Z (Word64.>> (z, R_BITS)), Word32.>> (Z z, R_BITS - M_BITS))
end
val x = _import "xxx": Z;
val y = _import "yyy": Z;
------------------------------------------------------------
> However, one _big_ problem I have is the inability of SML to capture a
> product in a larger integer. So Word32*Word32 _SHOULD_ give Word64...
> but it doesn't. This realllly sucks.
>
> Real hardware does this in one instruction.
> Promoting them to Word64s and then multiplying does not do that.
MLton does not do that right now, since it will do a function call for
each Word32->Word64 conversion and for the Word64 mul. But, as Henry
says, it seems quite plausible that MLton will do the right thing some
day. This is a straightforward local optimization. In any case, my
main point is that this is not a limitation of SML, just of MLton.
> The methods we are talking about here take a LOT longer than a method call.
> They are methods like:
> perform any FFT over this polynomial (like 32MB)
> Multiply a[i] by c^i for all a[i] in the polynomial
> Multiply a[i]*b[i] and return the resulting vector
> Compute the polynomial with roots at these 3M points.
> ... you get the idea. =)
I wonder if your C routines are expensive enough that marshalling is a
viable way to communicate between SML and C.
> PS. Your fibonacci.sml file is unfair compared to ocaml:
...
I took a look and you're right -- in fact, it is the OCaml entry that
is incorrect. It doesn't follow the test spec, which requires
recursive computation of the fibonacci numbers. No wonder it is ten
times faster than other languages. Brent, can you disable that test
until someone fixes it?
> Actually, a lot of your tests compare ocaml and mlton unfairly...
Please let us know of others.