wordfreq slowness explained
Stephen Weeks
MLton@sourcelight.com
Tue, 12 Jun 2001 23:08:02 -0700
> Silly bug: the reason that wordfreq is SO slow is that the hash that you use
> is completely broken. It produces exactly 26 different hash values. The
> reason for that is that in the `loop' function inside it, you do not loop.
Whoops. That got translated from a String.fold when extricating the code from
my basis into the insane standard basis that doesn't have a String.fold.
> I fixed this, and the CPU time went down from .76 CPU seconds to .13 CPU
> seconds. Assuming that this ratio holds on Doug's machine, we will be
> exactly the same speed as Perl, but still worse than half the speed of OCaml.
With the MLton Doug has (20000906-2), I see slighlt less than 50% running time
after the fix. With the latest MLton, I see jut under 30%.
> I'm going to hack a different implementation that is less strange and see
> what can be done.
Please do. I'm sure there's a lot of room there for improvement. I'm going to
try probe hashing and double hashing.