[MLton] The bool array/vector performance bug
Matthew Fluet
fluet at tti-c.org
Tue Nov 4 19:35:09 PST 2008
On Tue, 4 Nov 2008, Vesa Karvonen wrote:
> On Mon, Jun 2, 2008 at 1:30 AM, Matthew Fluet <fluet at tti-c.org> wrote:
>> On Sat, 31 May 2008, Vesa Karvonen wrote:
>
> Indeed. I would bet that accessing 32-bit elements on a modern 32-bit
> processor is only faster when it hits the L1 cache and even then the
> difference is likely to be minimal (if any). I would not be surprised
> to see that it would actually improve overall performance to
> essentially always pack booleans as 1 bit elements in memory
> representations (not just in arrays). It doesn't cost many extra
> instructions to test/set a bit (when properly optimized) rather than a
> byte/word and the savings in cache space (and even in time spent doing
> GC) could be significant.
I'm not convinced that 1 bit booleans in non-array objects would save that
much on GC time (by saving object space). All objects need to be at least
4byte aligned, so one would need to have an object with more than 4
booleans to be more space efficient with 1 bit booleans than with 8bit
booleans.
>>> - don't allow ML bool in the FFI, but do
>>> - allow C Bool ("C_Bool.t") in the FFI.
>>
>> The C_Bool structure is in the Basis Library implementation, but not
>> currently exported by c-types.mlb. The C_Bool structure is equal to (the
>> application of the WordToBool functor to) a WordN structure, where N
>> corresponds to the target platform's size of the C-type _Bool.
>
> I think that for the purpose of writing portable code it would make
> sense to export C_Bool regardless of whether the ML bool
> representation is changed.
Sure, though I don't believe that the C _Bool type is used much in
standard headers.
> Given a set of alternatives ways to improve the ML bool representation
> (the goal), it makes sense to pick an alternative that also allows (or
> encourages) one to write portable code. Clearly divorcing ML bool
> from C bool does that. The choice of ML bool representation could
> then be done freely. It could theoretically even be a compiler switch
> (-default-type {bool1, bool8, bool16, bool32, bool64}).
Agreed, that it would be good to allow/encourage portable code.
>> One other item is the handling of primitives that return boolean values.
>> Some of these primitives are not implemented by the codegens (e.g., the x86
>> codegen doesn't implement Word64_{equal,lt}), and so are turned into C
>> functions (in <src>/mlton/codegen/ssa-to-rssa.fun). (This is why they are
>> declared in <src>/runtime/gen/basis-ffi.def.) If the C functions do not
>> return values in the same representation as that of an ML bool, then you
>> will need to insert some coercion, either a word-size coercion or an
>> explicit test against 0.
>
> Hmm... What would be the best way to fix this? These are somewhat
> special in that they are really a part of the runtime/compiler and can
> be changed at will if/when the bool representation is changed and I
> wouldn't want to add extra overhead to them.
I don't think there is any way to "fix" this other than to make sure that
the C functions that implement conditional primitives agree with the
compiler on the representation of booleans, or insert coercions.
Here are some other thoughts. There is a long-standing tension/debate on
whether conditionals should be value-producing or control-flow-producing.
MLton currently takes a value-producing view of (most) conditionals (the
exception being the checked-arithmetic primitives); that is, the Word32
equality primitive is used like:
x_5: bool = Word32_equal (x_2, global_1)
Under the control-flow-producing view, it would be used like:
if Word32_equal (x_2, global_1) then L1 () else L2 ()
where the primitive is a transfer (the "if-then-else" being
pretty-printing sugar). Note that the two views are equivalent, in the
sense that one can always convert a value to control-flow:
x_5: bool = Word32_equal (x_2, global_1)
case x_5 of true => L1 () | false => L2 ()
and control-flow to a value:
datatype boolz = truez | falsez
if Word32_equal (x_2, global_1) then L1 () else L2 ()
L1 ()
x_1: boolz = truez
L3 (x_1)
L2 ()
x_2: boolz = falsez
L3 (x_2)
L3 (x_5: boolz)
...
Note that the boolz type is not special in any way; indeed, the compiler
is free to use any representation it sees fit -- currently, MLton would
represent it as 1bit (padded to 8bits when used in an array).
I point this out because one could introduce an pass that separates the
bool type (as returned by conditional primitives) from the type of the
conditional value stored in data structures. That is, introduce the boolz
datatype and transform:
x_5: bool = Word32_equal (x_2, global_1)
...
to
z: bool = Word32_equal (x_2, global_1)
case z of true => L1 () | false => L2 ()
L1 ()
zt: boolz = truez
L3 (zt: boolz)
L2 ()
zf: boolz = falsez
L3 (zf: boolz)
L3 (x_5: boolz)
...
for fresh z, L1, zt, L2, zf, L3 and transform
case y of true => Lt () | false => Lf ()
to
case y of truez => Lt () | falsez => Lf ()
and otherwise transform the "bool" type to "boolz" type. One could
optimize the transformation; it doesn't make much sense to transform
x_5: bool = Word32_equal (x_2, global_1)
case x_5 of true => L1 () | false => L2 ()
when x_5 is otherwise not used (though, the knownCase optimization should
clean up after the un-optimized transformation in this case). Note, that
you could apply a similar transformation to foreign functions with bool
arguments or return; one would require a much more complicated
transformation to handle foreign functions with "bool array", "bool
vector", or "bool ref" arguments or results. (One would need to do a
dataflow analysis to determine which objects need to maintain a
representation compatible with C.)
So, there is a way of changing the space usage of booleans without needing
to change the implementation of the primitives. There is a little
overhead here when a 32-bit integer is returned from a C function
(implementing a primitive) and then explicitly tested before doing an
explicit assignment of 0 or 1 (as an 8bit integer).
More information about the MLton
mailing list