[MLton] property-list vs. hashtable

Thu, 2 Dec 2004 09:41:48 -0800

> What is the advantage of using property lists vs using a hashtable?

In short, I agree with your analysis.  Here's my take.

Both property lists and hash tables can be used to implement the type

 	X -> Y1 x ... x Yn

Such functions occur quite commonly in the compiler, where X is Var.t
and the Yi are the various bits of information we would like to
associate with variables.

We can represent an element of type (X -> Y1 x ... x Yn) as a
two-dimensional table, where each row corresponds to an element of X
and there is one column for each Yi, with the entry in row x column i
holding some yi in Yi.  Property lists group the table by row (one
property list for each variable), while hash tables group it by
column).

At some point during a compiler pass, we have in our hands an element
of X (a variable) and know which Yi we want.  To find it with a
property list takes time proportional to n.  Hash tables take constant
time (hopefully), but worst case time proportional to the number of
rows.  In MLton, n is typically very small (< 10), and can in practice
be made smaller by compressing several Bi a record type (which we
frequently do).  After this kind of transformation, many passes have n
= 1.  So, property lists give an easy to reason about time guarantee,
and it is difficult to imagine hash table lookup performing as well.

Space performance is a mixed bag.  In MLton, the lifetime of the
element of (X -> Y1 x ... x Yn) is a single pass.  Hence, with either
approach, we can simply wait until the end of the pass to do the
cleanup, at the cost of having extra live data during the pass.  By
holding on to rows, one may keep extra columns around.  By holding
onto columns, one may keep extra rows around.  In a pass where code is
shrinking, property lists should win.  In a pass where the Yi are
transient, hash tables should win.  It's not clear if this really
matters within a pass.

With both approaches, it is possible to do extra cleanup to get better
space usage.  With property lists, when a column is deleted, we can
walk all the property lists and delete the corresponding Yi.  With
hash tables, when a row is deleted, we can walk all the hash tables
and delete the corresponding X.

It is also possible to use finalizers to do the extra cleanup
automatically.  But with MLton's implementation, finalizers bring in
threads, which unfortunately hurts our other analyses and causes a
significant slowdown.

There might be an advantage to hash tables in data representation,
since they don't require a universal type.

It would be nice to do an experiment to use hash tables instead of
property lists.  Unfortunately, I don't see how to do it without
rewriting all the uses of property lists.  I don't think your proposal
works.

> functor PropertyList(H: HASH_TABLE): PROPERTY_LIST =
>   struct
>      datatype t = T of Word.word * unit ref
> 
>      local
>        fun pseudo-rand () = ...
>      in
>        fun new () = T (pseudo-rand(), ref ())
>      end
> 
>      fun newProperty () =
>        let
>          val ht = HashTable.new {hash = fn T (w, _) => w}
>          ...
>        end
>   end

For the hash table to work, you need to hash each element of X,
i.e. each variable.  What you've written here only has a hash for each
property.  I don't see how it can be made to work with the current
interface.  We need something more like

signature PROPERTY_LIST =
   sig
      type t

      val new: unit -> t
      val newProperty: {hash: 'a -> word} -> {add: t * 'a -> unit,
                                              peek: t -> 'a option}
   end

It could be fun to generalize property lists in MLton so that one
could dynamically choose which approach to use.  But I am not
optimistic that it will lead to any performance improvement.

Historical note: 

Property lists appear in the earliest snapshot of the compiler that I
have, dated October 1997.  They've always worked well once we figured
out how to keep the number of properties small, so we haven't really
tried other approaches.