[MLton] PackWord to/from nonsense
Matthew Fluet
fluet at tti-c.org
Wed Jul 8 08:26:37 PDT 2009
On Tue, 7 Jul 2009, Wesley W. Terpstra wrote:
> In our networking code, I worked around this by using _prim
> "Word8Array_subWordX" if MLton is used. This avoids the two C calls casting
> in and out of a 64-bit word for every word written into the data stream.
A number of 64-bit operations can (and should) be implemented by the
native x86 codegen, to avoid the C calls. This should help even in the
presence of conversion optimizations.
> I
> recently ran into trouble on a 64-bit machine because SeqIndex.int is not
> int, and I got a PrimApp error. As a stop-gap measure, I'm open to
> suggestions of an Int/Word type that must match SeqIndex.
You can use the same technique that the Basis Library uses. There is an
(undocumented) MLB path variable SEQINDEX_INT which expands to either
"int32" or "int64", depending on the size of indices of the target
platform. You can nicely package it up in a .mlb file as follows:
** seqindex.mlb
local
$(SML_LIB)/basis/basis.mlb
in
seqindex-$(SEQINDEX_INT).sml
end
** seqindex-int32.sml
structure SeqIndex = Int32
** seqindex-int64.sml
structure SeqIndex = Int64
> It would be nice to have 'unsafe' versions without the LargeWord baggage
> available somewhere, so _prim isn't needed. Armed with 'unsafe' PackWord, it
> would be easy to implement faster string/Word8Array copies, as discussed
> beforre.
I'm not sure why you call them "unsafe" versions. Your proposed PACK_WORD
signature (with the "type word" specification) wouldn't be unsafe in any
way.
> I'll also note that PackWord represents yet another case where the basis
> library expects MLton to optimize fromLarge o toLarge to nothing.
> ...
> If that conversion optimization were placed before commonArg and knownCase I
> think Int8.fromFixed o Int8.toFixed would even become a no-op with overflow
> checking:
>
> x_1 = ...
> x_2 = WordU8_sextdToWord64 x_1
> x_3 = WordU64_sextdToWord8 x_2
> (* from iwconv0 bounds checking: *)
> x_4 = WordU8_sextdToWord64 x_3
> x_5 = Word64_eq (x_2, x_4)
> raise Overflow exception if x_5 is false
>
> First, comes the new optimization:
> x_3 = x_1
> Then comes commonArg/commSubexp
> x_4 and x_3 are replaced by x_2 and x_1 respectively
> Then comes knownCase:
> Word64_eq (x_2, x_2) is never false -> exception never raised
>
> Am I correct in this assessment?
In general, yes, conversion optimization should be a win. However, the
"clean-up" optimizations aren't commonArg and knownCase. The SSA shrinker
(ssa/shrink.fun) will perform the necessary simplifications:
* copy propagation of x_3 = x_1 (replace all uses of x_3 by x_1 and
eliminate the x_3 variable)
* prim-app folding of Word64_eq (x_2, x_2) to true
* case simplification of a manifest discriminant
knownCase handles case simplification when the discriminant is only
manifest on some of the incoming edges. That is, the SSA shrinker will
get:
L_1:
x_10 = true
case x_10 of true => L_11 | false => L_12
while knownCase will get:
L_1():
x_10 = true
L_4(x_10)
L_2():
x_20 = false
L_4(x_20)
L_3():
x_30 = Word64_eq (x_1, x_2)
L_4(x_30)
L_4(x_40):
case x_40 of true => L_11 | false => L_12
transforming it to:
L_1():
x_10 = true
L_11()
L_2():
x_20 = false
L_12()
L_3():
x_30 = Word64_eq (x_1, x_2)
L_4(x_30)
L_4(x_40):
case x_40 of true => L_11 | false => L_12
It is likely that then the SSA shrinker will be able to eliminate the use
of x_10 and x_20 as unused variables, perform the jump chaining to replace
transfers to L_1 by L_11 and L_2 by L_12, and combine the L_3 and L_4
blocks (assuming that now L_3 is the only predecessor of L_4).
> If so, that's a pretty serious speed-up: 5
> C calls and a potential branch turned into a no-op. Compared to 4 conversion
> in/out of an IntInf, things look even better!
More information about the MLton
mailing list