[MLton] Performance of Real.toInt

Tue Oct 28 09:02:24 PST 2008

Matthew, thanks for the comprehensive response.

I feel a little bad, because when I looked at the numeric ranges
involved I was able to fix this application to use integer ops only
;).  It doesn't fix the performance divot, but avoids it.

By the way, doesn't the FIST instruction raise a numeric exception on
overflow?  For the x86 backend, could the hardware do some of the work
to avoid some of the range checking in real.sml?

On Mon, Oct 27, 2008 at 12:48 PM, Matthew Fluet <fluet at tti-c.org> wrote:
> On Sun, 26 Oct 2008, Vesa Karvonen wrote:
>>
>> On Fri, Oct 24, 2008 at 10:50 PM, Ryan Newton <rrnewton at gmail.com> wrote:
>>>
>>> Under MLton I generate code like this:
>>>
>>>  (Real64.toInt IEEEReal.TO_ZERO (var_tmpsmp_77))
>>>
>>> But it performs very poorly.  I haven't researched this, but if I had
>>> to guess, I'd bet this is because mlton is implementing some more
>>> semantically meaningful notion than C casts.
>>
>> An excellent guess!
>>
>>> Nevertheless, is there
>>> any inexpensive way to ape the behavior one gets from (int)x in C?
>>
>> Have you peeked into the real/real.sml source file in MLton's basis
>> library implementation?  The implementation of Real.toInt uses a
>> family of toInt<N>Unsafe functions, that do not set the rounding mode
>> or check that the floating point number is in the range of the integer
>> type.  One could perhaps extend the MLton.Real structure
>> (http://mlton.org/MLtonReal) to expose those functions.  You could
>> then implement the conversion in terms of the unsafe functions.
>
> As Vesa noted, SML's Real.toInt function does a lot more range checking than
> C's (int)d cast.  In SML, there are at least two floating-point comparisons
> (performing the range check), a rounding mode set, a floating-point round, a
> rounding mode (re)set, and a floating-point to int coercion (the
> toInt<N>Unsafe).
>
> If you are using the C codegen, then toInt<N>Unsafe is implemented by a C
> cast; the semantics of a C cast is to convert with truncation (TO_ZERO)
> semantics.  If you are using the x86 codegen, then toInt<N>Unsafe is
> implemented by the 'fist' instruction; the semantics of the 'fist'
> instruction is to convert with the current rounding mode.  If you are using
> the amd64 codegen, then toInt<N>Unsafe is implemented by the
> 'cvt{s,d}2si{l,q}' instruction; the semantics of the 'cvt{s,d}2si{l,q}'
> instruction is to convert with truncation (TO_ZERO) semantics.  Since the
> implmentations of toInt<N>Unsafe do not always obey the current rounding
> mode, the SML implementation first does a floating-point round (under an
> appropriate rounding mode); thus, all of the toInt<N>Unsafe implementations
> behave the same.  But, it also means that the toInt<N>Unsafe primitives are
> only well defined when the floating-point value is an integer; on
> non-integeral floating-point values, the different codegens could return
> different results.
>
> Note: on x86 with the C-codegen, the C cast actually generates another
> set/reset of the rounding mode, because gcc wants to use the 'fist'
> instruction, but with truncation (TO_ZERO) semantics (rather than the
> current rounding mode).  This may also be the case on other architectures.
>
> If you are exclusively using the C-codegen, the exposing the toInt<N>Unsafe
> functions in the MLton.Real structure would have the behavior of a C-cast.
>  (It will still be a little slower, because the cast will occur in a
> non-inlined function; we don't inline some of the floating-point operations,
> because gcc will constant fold without obeying possible changes in the
> rounding mode.  Though, given the explaination above, since C's cast always
> ignores the current rounding mode and uses truncation semantics, then it may
> be acceptable to inline.)
>
> If you wanted something a little more well-defined, you could expose in
> MLton.Real the composition of Primitive.Real<N>.round with
> Primitive.Real<N>.toInt<M>Unsafe.  That would first do a floating-point
> round to integer (under the current rounding mode), followed by a coercion
> to int (which, because the input will be an integral floating-point, will be
> well-defined for all implementations).  However, this would be slightly
> different from a C-cast, since the default floating-point rounding mode is
> TO_NEAREST (at least on x86 and amd64, and possibly specified by C99 and/or
> IEEE754), not TO_ZERO.
>
> So, lots of choices, but nothing jumps out as a clear winner.
>
>