[MLton-devel] CPS vs SSA

Stephen Weeks MLton@mlton.org
Fri, 10 Jan 2003 17:35:38 -0800

Lal George is currently developing Nova, a compiler for the Intel IXP.
Nova is written in SML and uses MLRISC.  I mentioned to Lal that MLton
had switched from using a CPS IL to using an SSA IL.  He had the
following response.

> Re: SSA -- very interesting!
> Matthias and I had a lengthy discussion on
> this, and came to the well known conclusion that
> that both forms (CPS and SSA) were equally convenient
> from an optimization point of view ... but, the
> compactness of CPS was just so much easier to process
> and manipulate symbolically. It seemed much easier
> to splice trees and move variables around in a tree based
> CPS form than it was in the traditional CFG based SSA form.
> Since you've gone both ways, what do you think?

In short, I see no advantages to using CPS instead of SSA, and I see
several disadvantages.  Note, this is in the context of a first-order
IL, which is the case in MLton, and I assume in Nova.  It's also in
the context of MLton's SSA IL, which uses a more functional notation
instead of traditional phi-functions.

What is the difference between CPS and SSA?  In both, a function
consists of a collection of basic blocks (called continuations in
CPS).  The difference is that in CPS, there is a syntax tree in which
the blocks are nested, while in SSA, the blocks are in a flat vector
(and describe a control-flow graph CFG).  So, what does the syntax
tree in CPS do?

1. All of the blocks appear exactly once in the tree.  So a recursive
walk over the tree visits all blocks and variables once.

2. It satisfies the lexical scoping condition for variables: a
preorder walk over the tree is guaranteed to visit variable
definitions before uses.

3. It satisfies the lexical scoping condition for blocks: a preorder
walk over the tree visits block definitions before uses.

4. It provides an approximation to the dominator relation.  If block A
is nested within block B, then B dominates A.  It is only
approximation, however.  It may be that B dominates A but A is not
nested within B.

So, transformations on CPS can use recursive tree walks to accomplish
their tasks.  Let's see how we accomplish the same things in SSA.

1. All of the blocks appear exactly once in the vector.  So, a loop
over the vector visits all blocks and variables once.

2. The variables satisfy the dominator condition: a definition of a
variable dominates all of its uses.  So, a preorder DFS over the CFG
visits variable definitions before uses.

3. A postorder DFS over the CFG visits block definitions before uses.

4. Once can compute the dominator tree on the CFG in almost linear

So, transformations on SSA use either a loop over all the blocks, a
DFS of the CFG, or a tree walk of the dominator tree, depending on
what's appropriate.

Since DFS and dominator tree computation are pretty easy to write, and
can be in libraries, I consider CPS and SSA of equivalent complexity
from the perspective of an input to some transformation.  The same
information is easily accessible in either.

However, in terms of a transformation producing output, I find that
producing CPS is more complex than SSA.  This is because CPS imposes
additional constraints that must be *explicitly* in the IL that SSA
does not impose: namely that the syntax tree must provide a sufficient
approximation to the dominator tree that a simple tree walk be able
to prove variable definitions dominate uses and block uses nest within
block uses.

Thinking of it another way, both CPS and SSA require that variable
definitions dominate uses.  The difference is that using CPS as an IL
requires that all transformations provide a proof of dominance in the
form of the nesting, while SSA doesn't.  Now, if a CPS transformation
doesn't do too much rewriting, then the partial dominance information
that it had from the input tree is sufficient for the output tree.
Hence tree splicing works fine.  However, sometimes it is not

As a concrete example, consider common-subexpression elimination.
Suppose we have a common subexpression x = e that dominates an
expression y = e in a function.  In CPS, if y = e happens to be within
the scope of x = e, then we are fine and can rewrite it to y = x.  If
however, y = e is not within the scope of x, then either we have to do
massive tree rewriting (essentially making the syntax tree closer to
the dominator tree) or skip the optimization.  Another way out is to
simply use the syntax tree as an approximation to the dominator tree
for common-subexpression elimination, but then you miss some
optimization opportunities.  On the other hand, with SSA, you simply
compute the dominator tree, and can always replace y = e with y = x,
without having to worry about providing a proof in the output that x
dominates y (i.e. without putting y in the scope of x)

As another example, when we used to use CPS in MLton, we ran into
problems with our contification optimization in trying to satisfy the
CPS condition that block uses occur within the scope of their
definitions.  We were forced to do some pretty messy tree rewriting,
and still had to disable some cases.  When we switched to SSA, we
didn't have to worry about scoping -- we simply put all the blocks
together and left it to later passes to do a DFS or compute dominators
if necessary.

>From the perspective of typed ILs, I view CPS as pushing some of the
work of the IL typechecker into each pass, by requiring each
transformation to produce the scoping information necessary to make it
easy for the type checker that variable definitions dominate uses.  On
the other hand, with SSA, the transformations don't have to do any
extra work, while the type checker has to do more, in that it has to
do a dominator computation.  So, with CPS, we complicate many
transformations to make the type checker simple, while with SSA, we
complicate one type checker to make many passes simpler.

Oh well, that was a long explanation.  I hope it made some sense.  We
have used CPS in MLton for over three years, from fall 1998 to fall
2001, and have used SSA for over a year now.  We are quite happy that
we switched.

I've added a little more detail below.  Feel free to skip it.

To be extremely clear on what I mean by CPS and SSA, here are
simplified grammars for first-order CPS and SSA ILs similar to what
was/is in MLton.  In the grammars, I'll use the following

        c in Const
        l in Label
        x in Var
        f in Func
        p in Prim
Here is the CPS grammar, with comments on the right.

<program>  ::= <fun>*
<fun>      ::= fun <f> (<x>*) = <exp>           toplevel function declaration
<exp>      ::= let <dec>* in <transfer> end             
<dec>      ::= <bind>
             | <block>
<block>    ::= block <l> (<x>*) = <exp>         continuation declaration
<bind>     ::= val <x> = <simple>               simple expression
<simple>   ::= <c>                              constant
             | <x>                              variable
             | <p> (<x>*)                       primitive application
<transfer> ::= <f> (<x>*)                       tail call
             | <l> (<f> (<x>*))                 nontail call
             | <l> (<x>*)                       jump
             | <x>*                             return

So, a program is a collection of mutually recursive functions.
Functions take multiple arguments and return multiple values.  The
body expression of a function is in CPS form (it may be a bit hard to
see that at first).  Continuations (labels) are declared with a
"block" keyword, so chosen because of the similarity with SSA basic
blocks.  The usual lexical scoping rules are in place -- a variable
must be bound before it is used, and a continuation must be defined
before it is jumped to or used in a nontail call.

I've tweaked the CPS grammar so that it is very easy to compare with
the SSA grammar.  All we have to do is to replace

<fun>      ::= fun <f> (<x>*) = <exp>
<exp>      ::= let <dec>* in <transfer> end             
<dec>      ::= <bind>
             | <block>


<fun>      ::= fun <f> (<x>*) = let <block>* in <l> () end
<exp>      ::= let <bind> in <transfer> end

and voila, now we have SSA, with the usual dominance condition.

Hopefully this makes very precise what I mean when I say that the
difference between CPS and SSA is the nesting of blocks.

This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
MLton-devel mailing list