re new SML/NJ
Matthew Fluet
mfluet@intertrust.com
Mon, 13 Aug 2001 12:03:37 -0700 (PDT)
> Speaking of which, I tried the silly floating point test that appeard in
> comp.lang.functional
> fun test (n, x) =
> if n = 0
> then x
> else test (n - 1, x + Math.cos x)
> and our code was definitely nicely faster than gcc:
> gcc 278.9 nanoseconds
> MLton 234.2 nanoseconds
> so we are 20% faster. I looked at the code, and ours still does one floating
> point load and one store per loop, but the C version is doming some really
> funny stuff.
>
> In the best of all worlds, the back end would figure out that the floating
> point register only has to be stored when the loop finishes.
I finally checked this out under the new codegen (which is carrying
floating-point values across blocks in registers). Here's the hot loop:
statementLimitCheckLoop_7:
cmpl (gcState+8),%esi
jbe skipGC_7
skipGC_7:
testl %esp,%esp
jz L_241
decl %esp
jo L_242
fld %st
fcos
faddp %st, %st(1)
jmp statementLimitCheckLoop_7
Pretty good, I think. No memory traffic -- except for the GC check, which
really seems like it should be delayed until the loop exits.