cps & contification
Matthew Fluet
fluet@CS.Cornell.EDU
Thu, 25 Jan 2001 11:21:38 -0500 (EST)
> Matthew, could you please send out the latest summary of the entire analysis so
> that Henry could read through that instead of reading through all of our
> mistakes? Thanks.
Here is is. There are a bunch of "tricky" examples down at the very end.
Also, I've been working on a proof that
If B is a safe analysis and B(f) = h in Fun, then A(f) in {Uncalled} U Fun.
The idea is the following: if B(f) = h and A'(f) <> Uncalled, then h must
dominate f in G. But it keeps slipping away.
Abstractly, a program consists of a main function and a set of tail and nontail
calls. A tail call consists of a caller and a callee. A nontail call consists
of a caller, callee, and jump.
j in Jump
f in Fun
TailCall = Fun x Fun
NonTailCall = Fun x Fun x Jump
p in Program = Fun x P(TailCall) x P (NonTailCall)
An analysis is a map from functions to the continuation that they return to.
RCont = {Uncalled, Unknown} U Jump U Fun
A in Analysis = Fun -> RCont
Here is the intended meaning of A.
A(f) meaning
-------- --------------------------------------------------
Uncalled f is never called during evaluation of the program
j in Jump f always returns to continuation j
g in Fun f always returns to function g
Unknown f returns to many places
An analysis A is safe for program p = (fm, T, N) if
* A(fm) = Unknown
** For all nontail calls (f, g, j) in N,
A(f) = Uncalled
or A(g) in {j, Unknown}
*** For all tail calls (f, g) in T,
A(f) = Uncalled
or A(g) in {f, A(f), Unknown}
Claim: Let A be safe.
If there exists a path of calls from fm to f,
then A(f) <> Uncalled.
Proof:
Suppose there exists a path of calls from fm to f.
Proceed by induction on the length of the path.
n = 0: Then f = fm.
A(f) = Unknown (because A is safe).
Hence A(f) <> Uncalled.
n > 0: Then there exists g
such that there exists a path of n-1 calls from fm to g
and (g, f) in T or (g, f, j) in N.
By induction hypothesis, A(g) <> Uncalled.
Suppose (g, f) in T.
A(f) in {g, A(g), Unknown} because A is safe.
Hence A(f) <> Uncalled.
Suppose (g, f, j) in N.
A(f) in {j, Unknown} because A is safe.
Hence A(f) <> Uncalled.
Turn RCont into a lattice with the following order
Uncalled <= x
x <= x
x <= Unknown
where x is an arbitrary element of Fun U Jump.
Extend <= pointwise to make A into a lattice.
Define the function C: Analysis -> Analysis by
C(A)(g) =
if g = fm
then Unknown
else lub ({j | (f, g, j) in N and A(f) <> Uncalled}
U {A(f) | (f, g) in T})
Claim: C is monotone and continuous.
Proof: Clear.
Claim: If A = C(A) then A is safe.
Proof:
* A(fm) = C(A)(fm)
= if fm = fm
then Unknown
else ...
= Unknown.
** Let (F, G, J) be an arbitrary element of N
Suppose A(F) = Uncalled.
Then done.
Suppose A(F) <> Uncalled.
Then A(G) = C(A)(G)
= if G = fm
then Unknown
else lub ({j | (f, g, j) in N and A(f) <> Uncalled}
U {A(f) | (f, g) in T})
= if G = fm
then Unknown
else J \/ lub ({j | (f, g, j) in N and A(f) <> Uncalled}
U {A(f) | (f, g) in T})
in {J, Unknown}
subset {J, Unknown}.
*** Let (F, G) be an arbitrary element of T
Suppose A(F) = Uncalled.
Then done.
Suppose A(F) <> Uncalled.
Then A(G) = C(A)(G)
= if G = fm
then Unknown
else lub ({j | (f, g, j) in N and A(f) <> Uncalled}
U {A(f) | (f, g) in T})
= if G = fm
then Unknown
else A(F) \/ lub ({j | (f, g, j) in N and A(f) <> Uncalled}
U {A(f) | (f, g) in T})
in {A(F), Unknown}
subset {F, A(F), Unknown}.
Corollary: If A = C(A) then A satisfies:
***' for all tail calls (f, g) in T,
A(f) = Uncalled
or A(g) in {A(f), Unknown}.
Proof: By examination of case *** of previous proof.
Let A' = lfp(C).
Fact: A' is safe.
Claim: A'(f) = Uncalled iff
there does not exist a path of calls from fm to f.
Proof:
==> (prove contrapositive: If there exists a path of calls from fm to f,
then A'(f) <> Uncalled.)
By previous claim (because A' is safe).
<==
Let P = {g | there exists a path of calls from fm to g}.
Define B(g) = Unknown if g in P
= Uncalled otherwise
Suppose g in P.
Then C(B)(g) = if g = fm
then Unknown
else lub ({j | (h, g, j) in N and B(h) <> Uncalled}
U {B(h) | (h, g) in T})
<= Unknown
= B(g).
Suppose g notin P.
Then C(B)(g) = if g = fm
then Unknown
else lub ({j | (h, g, j) in N and B(h) <> Uncalled}
U {B(h) | (h, g) in T})
= Uncalled
= B(g).
Hence, C(B) <= B.
Therefore, A' = lfp(C) <= B (by the Fixed-Point Theorem).
Suppose there does not exist a path of calls from fm to f.
Then A'(f) <= B(f) = Uncalled.
Hence, A'(f) = Uncalled.
Claim: A'(f) in {Uncalled, Unknown} U Jump.
Proof:
Suppose A'(f) = Uncalled.
Then done.
Suppose A'(f) <> Uncalled.
Then there exists a path of calls from fm to f (by previous claim).
Proceed by induction on the length of the path.
n = 0: Then f = fm.
A'(f) = Unknown (by *).
Hence A'(f) in {Unknown} U Jump.
n > 0: Then there exists g
such that there exists a path of n-1 calls from fm to g
and (g, f) in T or (g, f, j) in N.
By induction hypothesis, A'(g) in {Unknown} U Jump.
Suppose (g, f) in T.
A'(f) in {A'(g), Unknown} (by ***').
Hence A(f) in {Unknown} U Jump.
Suppose (g, f, j) in N.
A'(f) in {j, Unknown} (by **).
Hence A(f) in {Unknown} U Jump.
Hence, A'(f) in {Unknown} U Jump.
Fact: A' is essentially the "continuation based contification"
analysis. Note that the previous claim ensures that A'(f) in
{Uncalled, Unknown} U Jump, so the analysis only determines that a
function has a single continuation, as opposed to a single return.
Claim: Let B be a safe analysis.
If B(f) = Uncalled, then A'(f) = Uncalled.
If B(f) = j in Jump, then A'(f) <= j.
If B(f) = h in Fun, then A'(f) in {Uncalled, Unknown}
Proof.
Let B be a safe analysis.
Define B'(f) = Uncalled if B(f) = Uncalled
= j if B(f) = j in Jump
= Unknown otherwise.
Suppose B'(f) = Uncalled.
Then B(f) = Uncalled.
Further, forall (g, f, j) in N. B(g) = Uncalled
and forall (g, f) in T. B(g) = Uncalled, because B is safe.
Hence, forall (g, f, j) in N. B'(g) = Uncalled
and forall (g, f) in T. B'(g) = Uncalled.
Then C(B')(f) = if f = fm
then Unknown
else lub ({j | (h, g, j) in N and B(h) <> Uncalled}
U {B(h) | (h, g) in T})
= Uncalled
= B'(f).
Suppose B'(f) = j in Jump.
Then B(f) = j.
Further, forall (g, f, j') in N. B(g) = Uncalled or j = j'
and forall (g, f) in T. B(g) = Uncalled or B(g) = j, because B is safe.
Hence, forall (g, f, j') in N. B'(g) = Uncalled or j = j'
and forall (g, f) in T. B'(g) = Uncalled or B'(g) = j'.
Then C(B')(f) = if f = fm
then Unknown
else lub ({j | (h, g, j) in N and B(h) <> Uncalled}
U {B(h) | (h, g) in T})
<= Uncalled
= B'(f).
Suppose B'(f) = Unknown.
Then C(B')(f) <= Unknown = B'(f).
Hence, C(B') <= B'.
Therefore, A' = lfp(C) <= B (by the Fixed-Point Theorem).
Suppose B(f) = Uncalled.
Then A'(f) <= B'(f) = Uncalled.
Hence, A'(f) = Uncalled.
Suppose B(f) = j in Jump.
Then A'(f) <= B'(f) = j.
Hence, A'(f) <= j.
Suppose B(f) = h in Fun.
Then A'(f) in {Uncalled, Unknown} U Jump (by previous claim).
Suppose A'(f) = j in Jump.
Then there exists (g, f, j) in N such that A'(g) <> Uncalled.
Hence, there exists a path of calls from fm to g.
Then B(g) <> Uncalled (by previous claim).
Hence, B(f) in {j, Unknown} (because B is safe).
But B(f) = h in Fun. ==><==
Hence, A'(f) in {Uncalled, Unknown}.
Now define a directed graph G = (Node, Edge) where
Node = Fun U {Root}
Edge = {(Root, fm)}
U {(Root, f) | A'(f) <> Unknown}
U {(f, g) | (f, g) in T and A'(f) <> Uncalled
and A'(g) = Unknown}
U {(Root, g) | (f, g, j) in N and A'(f) <> Uncalled
and A'(g) = Unknown}
Claim: For all f in Fun, there is a path from Root to f in G.
Proof:
If A'(f) = <> Unknown, then (Root, f) in Edge (by second clause).
If A'(f) = Unknown,
then there is a path of calls from fm to f
(because A' = lfp(C) and A'(f) <> Uncalled).
If the last call on this path is a nontail call (g, f, j),
then (Root, f) in Edge (by fourth clause).
Otherwise, consider the longest suffix of this path that consists
of all tail calls: (f0, f1) (f1, f2) ... (fn-1, f)
If forall i, A'(fi) = Unknown,
then there is a path from f0 to f in G (by third clause).
If f0 = fm,
then (Root, f0) in Edge (by first clause)
and there is a path from Root to f in G.
If f0 <> fm,
then there exists (g, f0, j) in N
and (Root, f0) in Edge (by fourth clause)
and there is a path from Root to f in G.
If exists k, A'(fk) <> Unknown and forall i > k, A'(fi) = Unknown
then there is a path from fk+1 to f in G (by third clause)
and (Root, fk) in Edge (by second clause)
and there is a path from Root to f in G.
Let D be the dominator tree of G rooted at Root.
For f in Fun, let parent(f) be the parent of f in D.
Define an analysis, A, based on A' and D as follows:
A(f) =
if A'(f) <> Unknown
then A'(f)
else if parent(f) = Root
then Unknown
else the ancestor g of f in D such that parent(g) = Root
Claim: A is safe.
Proof:
* A(fm) = if A'(fm) <> Unknown
then A'(fm)
else if parent(fm) = Root
then Unknown
else the ancestor g of f in D such that parent(g) = Root
= if parent(fm) = Root
then Unknown
else the ancestor g of f in D such that parent(g) = Root
= Unknown
** Let (f, g, j) be an arbitrary element of N
if A'(g) = Unknown
then if A'(f) = Uncalled
then A(f) = Uncalled
else A(g) = Unknown because (Root, g) in Edge (by fourth clause)
hence A(g) in {j, Unknown}
else A(g) = A'(g)
if A'(f) = Uncalled
then A(f) = Uncalled
else A'(g) = j (because A' is safe)
hence A(g) = A'(g)
hence A(g) in {j, Unknown}
*** Let (f, g) be an arbitrary element of T
if A'(g) = Unknown
then if A'(f) = Uncalled
then A(f) = Uncalled
else (f, g) in Edge (by third clause)
if parent(g) = Root
then A(g) = Unknown
hence A(g) in {f, A(f), Unknown}
else A(g) = h, the ancestor of g in D with parent(h) = Root.
if h = f
then A(g) = h = f
hence A(g) in {f, A(f), Unknown}
else A'(f) = Unknown (because h dominates f)
hence A(f) = h
hence A(g) = h = A(f)
hence A(g) in {f, A(f), Unknown}
else A'(g) <> Unknown
hence A'(g) in {A'(f)} (by corollary)
hence A(g) = A'(g) = A'(f) = A(f)
hence A(g) in {f, A(f), Unknown}
Alternatively, define an analysis, A, based on A' and D as follows:
A(f) =
if parent(f) = Root
then A'(f)
else the ancestor g of f in D such that parent(g) = Root
Claim: A is safe.
Proof:
* A(fm) = if parent(fm) = Root
then A'(fm)
else the ancestor g of f in D such that parent(g) = Root
= A'(fm)
(because (Root, fm) in Edge (by first clause))
= Unknown
(because A' is safe)
** Let (F, G, J) be an arbitrary element of N
Suppose A(F) = Uncalled.
Then done.
Suppose A(F) <> Uncalled.
Suppose A'(G) = Unknown.
Then A(g) = Unknown because (Root, g) in Edge (by fourth clause).
Hence A(g) in {J, Unknown}.
Suppose A'(G) <> Unknown.
Then A(G) = A'(G) because (Root, g) in Edge (by second clause).
Then A'(G) = J because A' is safe.
Hence A(G) in {J, Unknown}
*** Let (F, G) be an arbitrary element of T
Suppose A(F) = Uncalled.
Then done.
Suppose A(F) <> Uncalled.
Suppose A'(G) = Unknown.
Then (F, G) in Edge (by third clause).
Suppose parent(G) = Root.
Then A(G) = A'(G) = Unknown.
Hence A(G) in {F, A(F), Unknown}.
Suppose parent(G) <> Root.
Then A(G) = H, the ancestor of G in D with parent(H) = Root.
Suppose H = F.
Then A(G) = H = F.
Hence A(G) in {F, A(F), Unknown}.
Suppose H <> F.
Then A'(F) = Unknown because H dominates F.
Then A(F) = H.
Then A(G) = H = A(F).
Hence A(G) in {F, A(F), Unknown}.
Suppose A'(G) <> Unknown.
Then A(G) = A'(G) because (Root, G) in Edge (by second clause).
Then A'(G) in {A'(F)} (by corollary).
Then A'(F) = A'(G) <> Unknown.
Then A(F) = A'(F) because (Root, F) in Edge (by second clause).
Then A(G) = A(F).
Hence A(G) in {F, A(F), Unknown}.
Claim: Let B be a safe analysis.
If B(f) = Uncalled, then A(f) = Uncalled.
If B(f) = j in Jump, then A(f) <= j.
Proof:
Let B be a safe analysis.
Suppose B(f) = Uncalled.
Then A'(f) = Uncalled (by previous claim).
Then A(f) = A'(f) because (Root, f) in Edge (by second clause).
Hence A(f) = Uncalled.
Suppose B(f) = j in Jump.
Then A'(f) <= j (by previous claim).
Then A(f) = A'(f) because (Root, f) in Edge (by second clause).
Hence A(f) <= j.
Claim: Let B be a safe analysis.
If B(f) = h in Fun, then A(f) in {Uncalled} U Fun.
Proof:
Let B be a safe analysis.
Suppose A'(f) = Uncalled.
Then A(f) = A'(f) because (Root, f) in Edge (by second clause).
Hence A(f) = Uncalled.
Suppose A'(f) <> Uncalled.
...
Todo: a (definition of and a) proof of minimality.
I was thinking about trying to prove that the analysis A is the least
safe analysis under the RCont ordering extended pointwise to analyses.
But this won't work, because of nesting of functions:
fun h () = ...
fun g () = ... h () ...
fun f () = ... g () ...
fun fm () = ... L1 (f ()) ... L2 (f ()) ...
Then A(fm) = Unknown
A(f) = Unknown
A(g) = f
A(h) = f
But, B(fm) = Unknown
B(f) = Unknown
B(g) = f
B(h) = g
is also a safe analysis, but is incomparable to A.
It seems like for tail calls (f, g) in T, we want to favor A(g) = A(f)
over A(g) = f.
On the other hand, does it make more sense to really contify h in g?
This would result in deeper lexical nesting, which may make other
values available and avoid repeated computation. Does this correspond
to the following definition of B:
B(f) =
if parent(f) = Root
then A'(f)
else parent(f)
Intuitively, this seems like a safe analysis, but it violates the
current definition of safety:
fun fm () = ... f () ...
fun f () = ... g () ... i () ...
fun g () = ... h () ...
fun h () = ... i () ...
fun i () = ...
Then A(fm) = Unknown
A(f) = fm
A(g) = fm
A(h) = fm
A(i) = fm
And B(fm) = Unknown
B(f) = fm
B(g) = f
B(h) = g
B(i) = f
We have (h, i) in T,
but B(h) <> Uncalled and B(i) notin {h, B(h), Unknown}.
We would need something like
*** For all tail calls (f, g) in T
A(f) = Uncalled
or A(g) in {f, A(f), A(A(f)), ..., Unknown}
Are these orthogonal issues? The current definition of the analysis
"works" in the sense that we always get useful and correct
information; i.e., if A(f) = g in Fun, then every call to f always
returns to g. Then there are the other two issues: 1) can mutual
recursion be acheived by nesting? and 2) is deeper lexical nesting
desired? As Steve pointed out, the second of these issues could
probably be dealt with by a separate simplification pass (possibly
yielding benefits for other passes which don't produce the best local
code). The first issue is for the most part independent of the
analysis -- the analysis answers "what would we like to do?" and the
mutual recursion deals with the question "what can we actually do
given the constraints of the IL?" But, logic (the benchmark) seems to
say that sometimes all of these issues do come together.
*****************************************************************************
Examples
*****************************************************************************
fun f () = let ... in loop () end
fun loop () = let ... in loop () end
fun g () = ... K1 (f ()) ... K2 (f ()) ...
(g,
{(g, f, K1), (g, f, K2)},
{(f, loop), (loop, loop)})
A'(f) = Unknown
A'(loop) = Unknown
A'(g) = Unknown
Edge =
{(f, loop), (loop, loop)}
U {}
U {(Root, f)}
U {(Root, g)}
A(f) = Unknown
A(loop) = f
A(g) = Unknown
*****************************************************************************
fun f () = ... g () ... g () ...
fun g () = ... f () ...
fun h () = ... K (f ()) ...
(h,
{(h, f, K)},
{(f, g), (g, f)})
A'(f) = K
A'(g) = K
A'(h) = Unknown
Edge =
{}
U {(Root, f), (Root, g)}
U {}
U {(Root, h)}
A(f) = K
A(g) = K
A(h) = Unknown
*****************************************************************************
(Note: this is essentially the concat_1 example that is always missed
by the present call-based and continuation-based analyses.)
fun f () = ... g () ... g () ...
fun g () = ... f () ....
fun h () = ... K1 (f ()) ... K2 (f ()) ...
(h,
{(h, K1, f), (h, K2, f)},
{(f, g), (g, f)})
A'(f) = Unknown
A'(g) = Unknown
A'(h) = Unknown
Edge =
{(f, g), (g, f)}
U {}
U {(Root, f)}
U {(Root, h)}
A(f) = Unknown
A(g) = f
A(h) = Unknown
*****************************************************************************
fun f () = ... h () ... g () ...
fun g () = ... h () ... f () ...
fun h () = ...
fun i () = ... K (f ()) ...
(i,
{(i, f, K)},
{(f, h), (f, g), (g, h), (g, f)})
A'(f) = K
A'(g) = K
A'(h) = K
A'(i) = Unknown
Edge =
{}
U {(Root, f), (Root, g), (Root, h)}
U {}
U {(Root, i)}
A(f) = K
A(g) = K
A(h) = K
A(i) = Unknown
*****************************************************************************
fun h () = ... K (i ()) ... L (f ()) ...
fun f () = ... g () ...
fun g () = ...
fun i () = ... f () ...
(h,
{(h, i, K), (h, f, L)},
{(f, g), (i, f)})
A'(h) = Unknown
A'(f) = Unknown
A'(g) = Unknown
A'(i) = K
Edge =
{(f, g), (i, f)}
U {(Root, i)}
U {(Root, f)}
U {(Root, h)}
A(h) = Unknown
A(f) = Unknown
A(g) = f
A(i) = K
*****************************************************************************
fun u () = ... K (g ()) ...
fun g () = ...
fun f () = ... g() ...
fun fm () = ... L1 (f ()) ... L2 (f ()) ...
(fm,
{(u, g, K), (fm, f, L1), (fm, f, L2)},
{(f, g)})
A'(fm) = Unknown
A'(f) = Unknown
A'(g) = Unknown
A'(u) = Uncalled
Edge =
{(Root, fm)}
U {(Root, u)}
U {(f,g)}
U {(Root, f)}
A(fm) = Unknown
A(f) = Unknown
A(g) = f
A(u) = Uncalled
*****************************************************************************
fun u () = ... g () ...
fun g () = ...
fun f () = ... g () ...
fun fm () = ... L1 (f ()) ... L2 (f ()) ...
(fm,
{(fm, f, L1), (fm, f, L2)},
{(u, g), (f, g)})
A'(fm) = Unknown
A'(f) = Unknown
A'(g) = Unknown
A'(u) = Uncalled
Edge =
{(Root, fm)}
U {(Root, u)}
U {(f, g)}
U {(Root, f)}
A(fm) = Unknown
A(f) = Unknown
A(g) = f
A(u) = Uncalled
*****************************************************************************
fun fm () = ... h () ...
fun h () = ... f () ... g () ...
fun f () = ... g () ...
fun g () = ... f () ...
(fm,
{},
{(fm, h), (h, f), (h, g), (f, g), (g, f)})
A'(fm) = Unknown
A'(h) = Unknown
A'(f) = Unknown
A'(g) = Unknown
Edge =
{(Root, fm),
U {}
U {(fm, h), (h, f), (h, g), (f, g), (g, f)}
U {}
A(fm) = Unknown
A(h) = fm
A(f) = fm
A(g) = fm
But no way of nesting h, f, and g to mimic mutual recursion.