[MLton-user] more optimization questions
Stephen Weeks
MLton-user@mlton.org
Wed, 21 Dec 2005 11:39:47 -0800
> > by a C call to a function that just calls fabs. The win in going from
> > (2) to (3) is in eliminating the C wrapper around fabs. If anyone
> > wants to repeat my experiment, I did (3) by adding a line to
> > lib/mlton/include/c-chunk.h:
> >
> > #define Real64_abs fabs
...
> Just FYI for the list. The above change to the c-chunk.h file requires:
>
> val abs = _import "fabs": real -> real;
>
> to be added to the .sml file.
I don't think that's necessary for the approach I described, which has
both the advantage of not needing to modify input SML program as well
as a performance advantage. To be more clear, here's what I did. I
started with a vanilla install of MLton 20051202. I then did two
things.
* Eliminated the definition of abs from line 91 of
lib/mlton/sml/basis/real/real.fun
* Added a line to lib/mlton/include/c-chunk.h
#define Real64_abs fabs
The first step causes MLton to use its primitive notion of abs, and
for the C codegen to emit a call to Real64_abs, which is a C wrapper
around fabs defined in libmlton.a. The second step replaces the call
to Real64_abs with a call to fabs, for which gcc emits the fabs
instruction (on x86 anyway).
If you add the line
val abs = _import "fabs": real -> real;
this tells MLton to treat abs as an FFI call, not a primitive. It
therefore does not know as much about it and will not generate as good
code. In particular, you will see the following in the generated C.
S(Word32, 72) = 142;
Push (76);
FlushFrontier();
FlushStackTop();
CReturnR64 = fabs (R64_19);
CacheFrontier();
CacheStackTop();
L_1363:
Push (-76);
R64_0 = CReturnR64;
This is not as good as what you will get with the approach I
described, namely the following two lines of generated C.
CReturnR64 = Real64_abs (R64_21);
R64_1 = CReturnR64;
Perhaps this difference may explain why I saw a better speedup than
you.
> Also, it seems like in 2 years there is a relatively good (?) chance
> that the fabs behavior has changed. Maybe the "proper" abs code is
> no longer required in the compiler.
We run on too many platforms with too many versions to check this, and
like to support older platforms too. Also, I suspect users would
spend more time tracking down correctness bugs if we made the change
than they would performance bugs if we didn't. So, I don't think it's
a good idea to eliminate the wrapper. I do think we will add
something like
structure FastReal: REAL
that will allow users to get at primitive versions of the Real
functions without the correctness wrappers. So they can get C-style
speed and C-style correctness :-) if they want. But they can do it
selectively and lazily, only after profiling shows it would help a
particular hot loop in their programs.