[MLton] Catching redundant pattern matches

Mon, 13 Dec 2004 14:04:12 -0800

> Moscow ML uses the following pattern-match compiler:
> 
> http://citeseer.ist.psu.edu/sestoft96ml.html
> 
> Which I thought about implementing in MLton for fun. It is a quite
> simple algorithm and the paper is rather interesting (IMO). 

Jesper, thanks for the link.  I'd not read that before.  I added it to
our References page.  MLton's algorithm is fairly similar (top-down,
left-to-right) and our match compiler generates similar decision
trees.  The main difference is that MLton's match compiler is
structured around partitioning the rules by doing tests, instead of
testing values.  So, in MLton, positive/negative information is
implicitly represented in the context of the various recursive calls,
and so MLton never has to explicitly check "do I already know this?" 
to avoid repeated tests.  Also, MLton destructures the value as it
does the tests and partitions the rules -- so no common-subexpression
elimination postpass is needed to avoid repeated selects.  That all
seems like a plus.

On the downside, MLton doesn't postprocess to equate identical trees.
So, while both MLton and Moscow ML will take exponential time on
pathological cases like the one Sestoft mentions in 7.3, MLton will
generate an exponential amount of code, while Moscow ML will generate
a linear amount.  All the SML implementations, except for Hamlet
(which doesn't do match compilation?), suffer from the exponential
time problem, and I suspect most suffer from the exponential code
problem.

I'd like to see a solution to this problem in MLton's match compiler,
although instead of using a postprocessor, I'd like to see memoization
during match compilation.  That would hopefully solve the exponential
time problem as well as exponential space.  It's not trivial though,
as the approach has to work for all permutations of the pathological
tuple patterns.  And there may be even more devious examples.  It
would be nice to see a provably linear-time match compiler.

> The interesting part of the paper is section 7.4 in which he detects
> such redundant cases.

MLton does pretty much the same thing.  There is code in
defunctorize.fun that counts the number of occurrences of each rule in
the match in some branch of the decision tree.  Zero occurrences means
that the rule is redundant.  Nonzero occurrences of the default
handler means a nonexhaustive match.  MLton also keeps prototype
values as it does the match compilation so it can display them in the
nonexhaustive warning message.