[MLton] hash-set.sml low level points
Henry Cejtin
henry@sourcelight.com
Tue, 13 Jul 2004 13:40:57 -0500
I was looking at hash-set.sml in mlton/lib/mlton/basic and noticed a few
strange things:
In newOfSize, it isn't correct to have the multiplication by 4 done on words
because then an overflow will not be detected. I would think that the line
should change from
{ hash = hash,
numBuckets = 0w4 * Word.roundUpToPowerOfTwo (Word.fromInt size) }
to
{ hash = hash,
numBuckets = Word.roundUpToPowerOfTwo (Word.fromInt (4 * size)) }
In fromList, the function body passed in the second arg to List.foreach is of
the form
(ignore (...); ())
Clearly this is redundant. (I would say just use ignore to get the unit
value.)
For the stats', if you want the avg to be the expected number of times
equality is checked on lookups, then that is not the average size of a
bucket, but, assuming that all keys are equally likely, the avergae size^2/2.
Isn't this what people really care about?
Finally, I'm curious that you make the table size so big (4 times the number
of elements. Obviously the optimal value all depends on how expensive
testing for key equality is, but I would guess that it is almost always very
quick. Even in cases where general equality can be expensive, it tends not
to be in the hash table because you only compare it against other items with
the same hash (mod the table size), so often the test is very fast. E.g.,
for strings, the sizes or first few characters are often different. I wonder
what happens if you replace the 4 by 1 (which is what I typically use) in a
self compile.
It is a bit also a bit strange that tables are never shrunk, although it all
depends on how much you remove elements.