[MLton] PackWord to/from nonsense
Wesley W. Terpstra
wesley at terpstra.ca
Tue Jul 7 04:08:12 PDT 2009
As I'm sure everyone has run into at some time or another, the PackWordX API
is flawed:
*val* bytesPerElem<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.bytesPerElem:VAL>
> *:* int
> *val* isBigEndian<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.isBigEndian:VAL>
> *:* bool
> *val* subVec<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subVec:VAL>
> *:* Word8Vector.vector *** int *->* LargeWord.word
> *val* subVecX<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subVecX:VAL>
> *:* Word8Vector.vector *** int *->* LargeWord.word
> *val* subArr<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subArr:VAL>
> *:* Word8Array.array *** int *->* LargeWord.word
> *val* subArrX<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subArrX:VAL>
> *:* Word8Array.array *** int *->* LargeWord.word
> *val* update<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.update:VAL>
> *:* Word8Array.array *** int *** LargeWord.word
> *->* unit
>
where instead it should read something like:
*type* word<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.bytesPerElem:VAL>
> *val* bytesPerElem<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.bytesPerElem:VAL>
> *:* int
> *val* isBigEndian<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.isBigEndian:VAL>
> *:* bool
> *val* subVec<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subVec:VAL>
> *:* Word8Vector.vector *** int *->* word
> ***val* subArr<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subArr:VAL>
> *:* Word8Array.array *** int *->* word
> ***val* update<http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.update:VAL>
> *:* Word8Array.array *** int *** word *->* unit
In our networking code, I worked around this by using _prim
"Word8Array_subWordX" if MLton is used. This avoids the two C calls casting
in and out of a 64-bit word for every word written into the data stream. I
recently ran into trouble on a 64-bit machine because SeqIndex.int is not
int, and I got a PrimApp error. As a stop-gap measure, I'm open to
suggestions of an Int/Word type that must match SeqIndex.
It would be nice to have 'unsafe' versions without the LargeWord baggage
available somewhere, so _prim isn't needed. Armed with 'unsafe' PackWord, it
would be easy to implement faster string/Word8Array copies, as discussed
beforre.
I'll also note that PackWord represents yet another case where the basis
library expects MLton to optimize fromLarge o toLarge to nothing. I've been
getting increasingly annoyed by the costs I pay to convert between types. I
really liked Vesa's suggestion of {to/from}Fixed for the INTEGER signature.
Combining that with the optimization to turn
x_1227: word32 = Word8Vector_subWord32 (x_1072, x_1074)
x_1226: word64 = WordU32_extdToWord64 (x_1227)
x_1225: word32 = WordU64_extdToWord32 (x_1226)
into
x_1225:Word = x_1227
I think we would be able to achieve 0-cost conversions in almost all the
cases where it is safe.
If that conversion optimization were placed before commonArg and knownCase I
think Int8.fromFixed o Int8.toFixed would even become a no-op with overflow
checking:
x_1 = ...
x_2 = WordU8_sextdToWord64 x_1
x_3 = WordU64_sextdToWord8 x_2
(* from iwconv0 bounds checking: *)
x_4 = WordU8_sextdToWord64 x_3
x_5 = Word64_eq (x_2, x_4)
raise Overflow exception if x_5 is false
First, comes the new optimization:
x_3 = x_1
Then comes commonArg/commSubexp
x_4 and x_3 are replaced by x_2 and x_1 respectively
Then comes knownCase:
Word64_eq (x_2, x_2) is never false -> exception never raised
Am I correct in this assessment? If so, that's a pretty serious speed-up: 5
C calls and a potential branch turned into a no-op. Compared to 4 conversion
in/out of an IntInf, things look even better!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mlton.org/pipermail/mlton/attachments/20090707/351a26ad/attachment-0001.html
More information about the MLton
mailing list