[MLton] Constant Folding of FP Operations
Matthew Fluet
fluet at tti-c.org
Mon Jun 2 10:09:32 PDT 2008
On Mon, 2 Jun 2008, Vesa Karvonen wrote:
> On Sun, Jun 1, 2008 at 10:29 PM, Matthew Fluet <fluet at tti-c.org> wrote:
>> On Fri, 23 May 2008, Vesa Karvonen wrote:
> [...]
>> Seems like a good optimization, and a sound one. Did you run the
>> benchmarks and observe any speedups?
>
> No, I don't think I ran the benchmarks with this optimization, but I see
> that you did. However, I did try the optimization with a slightly
> modified version (note the parentheses in "store + (...)"):
>
> of Sean McLauglin's example:
>
> http://mlton.org/pipermail/mlton-user/2007-December/001294.html
>
> After changing the SML expression to match the C expression (as shown
> above), MLton performed constant folding on the right-hand side expression
> and performance was much closer to C.
Good to see that the optimization kicked in, even if it is an artificial
example.
> Probably. As we know, gcc already performs constant folding of FP, so I
> would not be surprised if the optimization would not necessarily improve
> performance with the C backend (as seems to be that case with ray).
Except that after considering Sean McLaughlin's example, we don't allow
gcc to constant fold any floating-point options (in the C-codegen):
http://mlton.org/cgi-bin/viewsvn.cgi?view=rev&rev=5799
This is accomplished by making all real constants globals, and accessed
via an indirect load, so gcc can't see the constant value.
So, since the C-codegen didn't show any improvement with FP
constant-folding on the ray benchmark, I suspect that the improvement in
the native codegen was just system noise.
>> I'd really like to know a good way of cutting down the noise in the
>> benchmarks; consider that fib and tailfib (which use no FP operations,
>> and so yield identical assembly code) show a 1.04 and 1.06 slowdown,
>> respectively.
>
> How are the times measured? Looking very briefly at benchmark/main.sml
> (line 79) it would seem to use an average of times over a 60 second
> sampling period. Using the average probably does not give the best
> results. For a deterministic benchmark it should be best to run the
> benchmark multiple times and only take the lowest time. Any time above
> the lowest possible time for a deterministic benchmark should be noise
> from the system (switches to other processes trashing caches, etc.). So,
> by taking an average, you are actually letting the noise enter the
> measurement. (This was mentioned in the paper on dominators
>
> http://www.cs.rice.edu/~keith/EMBED/dom.pdf
>
> based on which I implemented the new dominators algorithm for MLton a
> while back.)
That's an interesting view, to take the minimum time, rather than the
average time.
>>> About the patch. The compiler changes do not compile with SML/NJ,
>>> which does not support 32-bit floating point (Real32), [...]
>>
>> You might consider dynamically checking the value of Real32.precision,
>> which would allow you to distinguish a "real" Real32 from a "dummy"
>> Real32 that is bound to Real64.
>
> Hmm... That approach would probably make it reasonably easy to make it
> compile with SML/NJ. The approaches I had in mind were somewhat more
> complicated. I'll have to try this.
In the long term, it would be good to avoid reliance on Real32 for any
purposes other than optimization. For example, we could make
datatype RealX.t = T of {size: RealSize.t,
value: IEEEReal.decimal_approx}
which mimics the treatment of WordX.t.
We currently require Real32.{fromString,isFinite} for elaborating real
constants, Real32.{==,signBit} for equality on real constants, and
Real32.format for emitting real constants. If Real32 is bound to Real64,
then we might accept Real32.real floating-point constants that are finite
in Real64 but would not be finite in Real32.[*] If Real32 is bound to
Real64, then we might judge two floating-point constants to be unequal
when they are equal at Real32.real; this is benign, since we only use
equality of floating-point constants combine equal constants, so a
conservative equality is fine. If Real32 is bound to Real64, then we
might emit more digits than necessary to represent the value; again, this
is benign.
[*] We currently use Real<N>.isFinite to give a compile-time error for
floating-point constants that do not denote finite values. The argument
for this is that The Definition states:
"it is a compile-time error if the constant does not make sense or does
not denote a value within the machine representation chosen for the
type."
Of course, one might argue that "1.0e123" denotes a specific 32-bit IEEE
754 value (namely, +inf) just as much as a "1.0e~1" does -- for neither
does the mathematical value of the IEEE value equal the mathematical value
of the string.
>> There also seems to be a little-endian assumption for constant-folding
>> castToWord and castFromWord. [...]
>
> Not exactly. The assumption is that the endiannes of reals and words
> agree in the sense that if you convert a real/word to little endian bytes
> and then convert the little endian bytes to a word/real, you'll get the
> desired cast. The operations are (in pseudo code):
>
> castFromWord = RealFromLittle o WordToLittle
> castToWord = WordFromLittle o RealToLittle
>
> That should be identical to:
>
> castFromWord = RealFromBig o WordToBig
> castToWord = WordFromBig o RealToBig
Ah, I see.
More information about the MLton
mailing list