As I&#39;m sure everyone has run into at some time or another, the PackWordX API is flawed:<br><br><blockquote style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;" class="gmail_quote">


<code><b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.bytesPerElem:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.bytesPerElem:VAL" target="_blank">bytesPerElem</a> <b>:</b> int</code><br><code>

<b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.isBigEndian:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.isBigEndian:VAL" target="_blank">isBigEndian</a> <b>:</b> bool</code><br><code>

<b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.subVec:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subVec:VAL" target="_blank">subVec</a>  <b>:</b> Word8Vector.vector <b>*</b> int <b>-&gt;</b> LargeWord.word</code><br>

<code>

<b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.subVecX:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subVecX:VAL" target="_blank">subVecX</a> <b>:</b> Word8Vector.vector <b>*</b> int <b>-&gt;</b> LargeWord.word</code><br>

<code>

<b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.subArr:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subArr:VAL" target="_blank">subArr</a>  <b>:</b> Word8Array.array <b>*</b> int <b>-&gt;</b> LargeWord.word</code><br>

<code>

<b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.subArrX:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subArrX:VAL" target="_blank">subArrX</a> <b>:</b> Word8Array.array <b>*</b> int <b>-&gt;</b> LargeWord.word</code><br>

<code>

<b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.update:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.update:VAL" target="_blank">update</a> <b>:</b> Word8Array.array <b>*</b> int <b>*</b> LargeWord.word</code><br>

<code>

               <b>-&gt;</b> unit</code><br><code></code></blockquote><code><br></code>where instead it should read something like:<code><br><br></code><blockquote style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;" class="gmail_quote">


<code><b>type</b> <a name="12254a6764f055fe_SIG:PACK_WORD.bytesPerElem:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.bytesPerElem:VAL" target="_blank">word</a></code><br><code><b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.bytesPerElem:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.bytesPerElem:VAL" target="_blank">bytesPerElem</a> <b>:</b> int</code><br>


<code>

</code><code><b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.isBigEndian:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.isBigEndian:VAL" target="_blank">isBigEndian</a> <b>:</b> bool</code><br><code>

<b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.subVec:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subVec:VAL" target="_blank">subVec</a>  <b>:</b> Word8Vector.vector <b>*</b> int <b>-&gt;</b> word</code><br>

<code>

<b></b></code><code><b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.subArr:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.subArr:VAL" target="_blank">subArr</a>  <b>:</b> Word8Array.array <b>*</b> int <b>-&gt;</b> word</code><br>


<code>

<b></b></code><code><b>val</b> <a name="12254a6764f055fe_SIG:PACK_WORD.update:VAL:SPEC" href="http://mlton.org/basis/pack-word.html#SIG:PACK_WORD.update:VAL" target="_blank">update</a> <b>:</b> Word8Array.array <b>*</b> int <b>*</b> word</code><code> <b>-&gt;</b> unit</code></blockquote>


<div><br>In our networking code, I worked around this by using _prim &quot;Word8Array_subWordX&quot;  if MLton is used. This avoids the two C calls casting in and out of a 64-bit word for every word written into the data stream. I recently ran into trouble on a 64-bit machine because SeqIndex.int is not int, and I got a PrimApp error. As a stop-gap measure, I&#39;m open to suggestions of an Int/Word type that must match SeqIndex.<br>


<br>It would be nice to have &#39;unsafe&#39; versions without the LargeWord baggage available somewhere, so _prim isn&#39;t needed. Armed with &#39;unsafe&#39; PackWord, it would be easy to implement faster string/Word8Array copies, as discussed beforre.<br>

<br>I&#39;ll also note that PackWord represents yet another case where the basis library expects MLton to optimize fromLarge o toLarge to nothing. I&#39;ve been getting increasingly annoyed by the costs I pay to convert between types. I really liked Vesa&#39;s suggestion of {to/from}Fixed for the INTEGER signature. Combining that with the optimization to turn<br>

  x_1227: word32 = Word8Vector_subWord32 (x_1072, x_1074)<br>

   x_1226: word64 = WordU32_extdToWord64 (x_1227)<br>

   x_1225: word32 = WordU64_extdToWord32 (x_1226)<br>into<br>  x_1225:Word = x_1227<br>I think we would be able to achieve 0-cost conversions in almost all the cases where it is safe.<br><br>If that conversion optimization were placed before commonArg and knownCase I think Int8.fromFixed o Int8.toFixed would even become a no-op with overflow checking:<br>

<br>x_1 = ...<br>x_2 = WordU8_sextdToWord64 x_1<br>x_3 = WordU64_sextdToWord8 x_2<br>(* from iwconv0 bounds checking: *)<br>x_4 = WordU8_sextdToWord64 x_3<br>x_5 = Word64_eq (x_2, x_4)<br>raise Overflow exception if x_5 is false<br>

<br>First, comes the new optimization:<br>x_3 = x_1<br>Then comes commonArg/commSubexp<br>x_4 and x_3 are replaced by x_2 and x_1 respectively<br>Then comes knownCase:<br>Word64_eq (x_2, x_2) is never false -&gt; exception never raised<br>

<br>Am I correct in this assessment? If so, that&#39;s a pretty serious speed-up: 5 C calls and a potential branch turned into a no-op. Compared to 4 conversion in/out of an IntInf, things look even better!<br><br></div>