Wait: bool arrays are 32 bits per entry, Word8Array's are 8 bits per entry, and yet the latter are 6 times slower? How could that be? I just tried the version of MLton I have (2001-08-06) and with that the bool version is epsilon faster (just under 1% faster). For the test he is running (8K sieve), using a byte means it fits in the L1 cache, while a word means it doesn't.