[MLton] property-list vs. hashtable
Matthew Fluet
fluet@cs.cornell.edu
Thu, 2 Dec 2004 09:12:20 -0500 (EST)
[Greg asked this question, and I didn't have a good answer:]
What is the advantage of using property lists vs using a hashtable?
It does seem to me that one could implement:
signature PROPERTY_LIST =
sig
type t
val new: unit -> t
val newProperty: unit -> {add: t * 'a -> unit,
peek: t -> 'a option}
end
as
functor PropertyList(H: HASH_TABLE): PROPERTY_LIST =
struct
datatype t = T of Word.word * unit ref
local
fun pseudo-rand () = ...
in
fun new () = T (pseudo-rand(), ref ())
end
fun newProperty () =
let
val ht = HashTable.new {hash = fn T (w, _) => w}
...
end
end
Both are trying to maintain the illusion of constant-time lookups, by
different methods. PropertyLists via keeping the list length short (i.e.,
one needs to call destroy and clear). HashTables via having a good hash
that keeps the table balanced.
HashTables seem to have the advantage that all of the space held by the
properties are only kept live by the hash-table itself (which is caught up
in the add/peek function closures); hence, when the add/peek functions
are collected, the properties go away.
On the other hand, PropertyLists have the advantage that we can reclaim
the space of properties as soon as we are done with a piece of the AST.
What I'm trying to say is that may of our optimizations look like:
val {labelInfo : Label.t -> ???, ...} = Property.new (Label.plist, ...)
val shrink = shrinkFunction globals
val functions =
List.revMap
(functions, fn f =>
let
val f = Function.make {...}
val f = shrink f
in
f
end)
The shrink of the function automatically calls Label.clear on each label
within this function; hence, we reclaim all of the space held by
properties associated with labels, because they are not needed while
processing the next function. We could acheive this with the hash-tables
by resetting the table. However, a varInfo : Var.t -> ??? is a bit
trickier, because we may have varInfo properties associated with globals
that we do not want to reset, because they are still needed.
So, I guess it really is just a question of usage. I can see situations
where each implementation would have its advantages.