Any notion of why the change from bool to Word8Array is so incredibly costly? I realize that sieve is a pretty idio-syncratic benchmark, but I would have thought that it wouldn't make that much of a difference. ... Ah, I get it: it must be that the factor of 8 in space caused us to be too big for some L1 cache, right?