[MLton] hash-set.sml low level points

Tue, 13 Jul 2004 13:40:57 -0500

I  was  looking  at  hash-set.sml  in mlton/lib/mlton/basic and noticed a few
strange things:

In newOfSize, it isn't correct to have the multiplication by 4 done on  words
because  then  an overflow will not be detected.  I would think that the line
should change from

    { hash = hash,
      numBuckets = 0w4 * Word.roundUpToPowerOfTwo (Word.fromInt size) }

to

    { hash = hash,
      numBuckets = Word.roundUpToPowerOfTwo (Word.fromInt (4 * size)) }

In fromList, the function body passed in the second arg to List.foreach is of
the form

    (ignore (...); ())

Clearly  this  is  redundant.   (I  would say just use ignore to get the unit
value.)

For the stats', if you want the avg  to  be  the  expected  number  of  times
equality  is  checked  on  lookups,  then  that  is not the average size of a
bucket, but, assuming that all keys are equally likely, the avergae size^2/2.
Isn't this what people really care about?

Finally,  I'm curious that you make the table size so big (4 times the number
of elements.  Obviously the  optimal  value  all  depends  on  how  expensive
testing  for key equality is, but I would guess that it is almost always very
quick.  Even in cases where general equality can be expensive, it  tends  not
to  be in the hash table because you only compare it against other items with
the same hash (mod the table size), so often the test is  very  fast.   E.g.,
for strings, the sizes or first few characters are often different.  I wonder
what happens if you replace the 4 by 1 (which is what I typically use)  in  a
self compile.

It  is a bit also a bit strange that tables are never shrunk, although it all
depends on how much you remove elements.