word freq
Henry Cejtin
henry@sourcelight.com
Wed, 13 Jun 2001 03:12:12 -0500
I tried an alternative implementation of the word frequency thing: no hash
table. The idea is that one pass of the merge sort can merge duplicates,
adjusting the counts. Now how do you do an incremental merge sort: you keep
a list (of sorted lists) where successive elements are twice as long. I.e.,
the first element is empty or of length 1, the second is empty or of length
2, etc. Now each time you get a new word, you look at the first slot. If it
is empty, stick it in. If it isn't, merge it, stick the empty list in and
proceed to the next element.
It won't cost you more than n log n time, and there isn't more than log n
duplication. Sadly, it isn't horrible, but it is definitely several times
slower.