benchmarking Poly/ML & floating point
Matthew Fluet
fluet@CS.Cornell.EDU
Fri, 8 Sep 2000 08:52:48 -0400 (EDT)
> pretty bad on the symbolic and integer benchmarks. It does manage to beat MLton
> on the procedure call benchmarks (fib, tak) and on merge, which is basically an
> allocation/gc benchmark. Hopefully we'll have fib and tak up to snuff with the
> native backend -- IIRC the differences between MLton and Poly/ML are about the
> same differences as between MLton C and MLton native. As to merge, I'd be
The ratios are just about the same. We should be close to +/-0.1 when
comparing Poly/ML to MLton native.
> interested to hear how MLton C compares to MLton native. I don't think merge
> made it into Matthew's latest round of benchmarks.
No, I haven't seen merge. Is it in the latest tar.gz that you posted?
In other news, I have a mostly complete floating point backend set up. It
natively handles all of the prim's except: cosh, sinh, tanh, exp, pow,
tan, which are done as ffi calls. It also does ffi calls for copysign,
frexp, and modf, but does not have any inline assembly for those
operations. (On the other hand, for the first group of prims, gcc does
have inline assembly, but I haven't been able to completely grok it all;
particularly something like pow which has lots of branching and special
cases to consider. In any event, they just resolve to the versions in
the math library, so they work, just not most efficiently). I also added
Real_round as a prim, since it's just inline assembly in mlton-lib. I
might do the same with the IEEE rounding mode.
I'm not sure that the semantics of Real_nequal and Real_qequal as given in
mlton-lib.h are correct. For one thing, gcc produces the same assembly
sequence for both functions. Also, looking at the basis spec,
!= is equivalent to not o == vs. #define Real_nequal(x,y) ((x) != (y))
Won't IEEE floating point return false on any comparison when one argument
is NaN? Should it really be (!((x) == (y)))? If that is the case, then I
think that qequal is o.k.
Lastly, performance. It's all over the place on the benchmarks that have
a floating point component. The worst right now is mandelbrot with
12.67s C vs 20.29s x86. On the other hand, some of the benchmarks are
coming in at .9 of MLton C; that's probably due to a large integer
component and a minor floating point component. Of course, I haven't
started writing any of the peephole optimizations for floating point
operations, so that should gain some performance. After that, I need to
look a little more carefully at how best to use the instructions that can
automatically pop the stack; right now the only ways of removing items
from the stack are by explicitly popping them (either to a memory location
or nowhere). But, overall, I think it looks pretty good.