word freq
Stephen Weeks
MLton@sourcelight.com
Wed, 13 Jun 2001 14:31:04 -0700
> I tried an alternative implementation of the word frequency thing: no hash
> table. The idea is that one pass of the merge sort can merge duplicates,
> adjusting the counts. Now how do you do an incremental merge sort: you keep
> a list (of sorted lists) where successive elements are twice as long. I.e.,
> the first element is empty or of length 1, the second is empty or of length
> 2, etc. Now each time you get a new word, you look at the first slot. If it
> is empty, stick it in. If it isn't, merge it, stick the empty list in and
> proceed to the next element.
Interesting. There is a similar implementation of merge sort in MLton's library
(src/lib/mlton/basic/merge-sort.sml).
> It won't cost you more than n log n time, and there isn't more than log n
> duplication. Sadly, it isn't horrible, but it is definitely several times
> slower.
Too bad. Oh well, we are the fourth fastest now on wordfreq anyways. But
there's probably still room for improvement.