[MLton] hash-set.sml low level points
Stephen Weeks
MLton@mlton.org
Tue, 13 Jul 2004 17:30:26 -0700
> I was looking at hash-set.sml in mlton/lib/mlton/basic and noticed a few
> strange things:
>
> In newOfSize, it isn't correct to have the multiplication by 4 done on words
> because then an overflow will not be detected.
True.
> I would think that the line should change from
>
> { hash = hash,
> numBuckets = 0w4 * Word.roundUpToPowerOfTwo (Word.fromInt size) }
>
> to
>
> { hash = hash,
> numBuckets = Word.roundUpToPowerOfTwo (Word.fromInt (4 * size)) }
This doesn't quite fix it, because Word.roundUpToPowerOfTwo may miss
an overflow as well. I fixed things by making all the computations on
integers.
> In fromList, the function body passed in the second arg to List.foreach is of
> the form
>
> (ignore (...); ())
>
> Clearly this is redundant. (I would say just use ignore to get the unit
> value.)
Yep. Fixed.
> For the stats', if you want the avg to be the expected number of times
> equality is checked on lookups, then that is not the average size of a
> bucket, but, assuming that all keys are equally likely, the avergae size^2/2.
> Isn't this what people really care about?
Dunno. If I were to publicize this interface, I would make sure
to expose enough so that people could compute whatever stats they
want. I have no objection to you adding another field to the stats
with what you want.
> Finally, I'm curious that you make the table size so big (4 times the number
> of elements.
No good reason.
> Obviously the optimal value all depends on how expensive testing for
> key equality is, but I would guess that it is almost always very
> quick. Even in cases where general equality can be expensive, it
> tends not to be in the hash table because you only compare it
> against other items with the same hash (mod the table size), so
> often the test is very fast. E.g., for strings, the sizes or first
> few characters are often different.
Agreed.
> I wonder what happens if you replace the 4 by 1 (which is what I
> typically use) in a self compile.
I await the results of your experiment.
> It is a bit also a bit strange that tables are never shrunk,
> although it all depends on how much you remove elements.
Yeah. HashSet.remove is used only once in MLton, so I've never been
motivated to fix it. I would be happy to see it fixed.