cps & contification
Matthew Fluet
fluet@CS.Cornell.EDU
Fri, 19 Jan 2001 11:49:26 -0500 (EST)
> Claim: there does not exist a path of calls from fm to f
> ==> A'(f) = Uncalled
> Proof:
> Define A'' (f) = Unknown if there exists a path of calls from fm to f
> Uncalled otherwise
> Observe that C (A'') = A''.
> Hence A' <= A'' (since A' is lfp).
> Hence, if there does not exist a path of calls from fm to f then
> A'(f) <= A''(f) = Uncalled
One minor modification: C(A'') <= A'',
because a function g which only appears once as a nontail callee will have
C(A'')(g) = j <= A''(g) = Unknown.
Result still holds because A'' is a prefixed point (rather than a fixed
point).
> Matthew, are you interested in doing the analysis implementation? It shouldn't
> be too bad given that the dominators algorithm is already there. If you do, it
> might be nice for you to catch up with the latest snapshot, which contains a new
> CPS IL, in which many of the lists have been changed to vectors.
The implementation was fairly easy. The distribution of functions caught
by each analysis is pretty much the same as I showed earlier for my broken
analysis, so I won't waste space with them. Suffice to say that there are
one or two more functions contifiable in most benchmarks, and in a few
there are a significant number.
Now, there were some anomalies:
testing logic
mlton -contify both
functions: 72 call_cont_new: 8 call_cont: 0 call_new: 7 cont_new: 0
call: 0 cont: 0 new: 47
functions: 57 call_cont_new: 0 call_cont: 0 call_new: 0 cont_new: 0
call: 0 cont: 0 new: 47
functions: 31 call_cont_new: 0 call_cont: 0 call_new: 0 cont_new: 0
call: 0 cont: 0 new: 22
mlton -contify new
functions: 72 call_cont_new: 8 call_cont: 0 call_new: 7 cont_new: 0
call: 0 cont: 0 new: 47
functions: 59 call_cont_new: 0 call_cont: 0 call_new: 2 cont_new: 0
call: 0 cont: 0 new: 47
functions: 31 call_cont_new: 0 call_cont: 0 call_new: 0 cont_new: 0
call: 0 cont: 0 new: 22
What's interesting here is that in -contify new, none of the functions
determined contifiable by the new analysis were actually contified. I
investigated this a little bit, and it seems that we've finally run up
against a case where the mutual recursion at the top-level can't be
mimiced by nesting.
testing zern
mlton -contify both
functions: 48 call_cont_new: 33 call_cont: 0 call_new: 0 cont_new: 2
call: 0 cont: 0 new: 1
functions: 13 call_cont_new: 1 call_cont: 0 call_new: 0 cont_new: 0
call: 0 cont: 0 new: 1
functions: 2 call_cont_new: 0 call_cont: 0 call_new: 0 cont_new: 0
call: 0 cont: 0 new: 0
0:07.96 real, 7.96 user, 0.00 sys
0:07.94 real, 7.94 user, 0.00 sys
0:07.95 real, 7.95 user, 0.00 sys
mlton -contify new
functions: 48 call_cont_new: 33 call_cont: 0 call_new: 0 cont_new: 2
call: 0 cont: 0 new: 1
functions: 12 call_cont_new: 1 call_cont: 0 call_new: 0 cont_new: 0
call: 0 cont: 0 new: 0
functions: 1 call_cont_new: 0 call_cont: 0 call_new: 0 cont_new: 0
call: 0 cont: 0 new: 0
0:07.08 real, 7.04 user, 0.03 sys
0:07.12 real, 7.11 user, 0.00 sys
0:07.08 real, 7.09 user, 0.00 sys
Times are for the runtime of zern. This might explain the big speed up
with the new analysis for zern: although there is only one extra function
determined contifiable, it changes the final cps IL from a program with
two top-level functions to a program with one top-level function. This
seems to suggest that there is a penalty for transfers between top-level
functions and/or chunks. I guess that would be expected. Also, recall
that the new function contifiable is just the function contified into
concat_0, which appears in many of the benchmarks -- so it doesn't seem
likely that any other cps related optimizations were done on zern that
weren't done on other benchmarks.
testing barnes-hut
mlton -contify none
0:31.35 real, 31.01 user, 0.35 sys
text data bss dec hex filename
62941 2528 2172 67641 10839 barnes-hut
0:06.29 real, 6.14 user, 0.15 sys
0:06.31 real, 6.14 user, 0.17 sys
0:06.28 real, 6.14 user, 0.14 sys
mlton -contify both
functions: 117 call_cont_new: 61 call_cont: 0 call_new: 5 cont_new: 2
call: 0 cont: 0 new: 3
functions: 47 call_cont_new: 5 call_cont: 0 call_new: 0 cont_new: 0
call: 0 cont: 0 new: 3
functions: 7 call_cont_new: 0 call_cont: 0 call_new: 0 cont_new: 0
call: 0 cont: 0 new: 0
0:04.20 real, 4.00 user, 0.20 sys
0:04.22 real, 3.93 user, 0.29 sys
0:04.21 real, 4.04 user, 0.17 sys
mlton -contify new
functions: 117 call_cont_new: 61 call_cont: 0 call_new: 5 cont_new: 2
call: 0 cont: 0 new: 3
functions: 44 call_cont_new: 5 call_cont: 0 call_new: 0 cont_new: 0
call: 0 cont: 0 new: 0
functions: 9 call_cont_new: 0 call_cont: 0 call_new: 0 cont_new: 0
call: 0 cont: 0 new: 0
0:04.97 real, 4.76 user, 0.22 sys
0:05.00 real, 4.88 user, 0.13 sys
0:04.97 real, 4.74 user, 0.24 sys
With barnes-hut, note that -contify both resulted in a program with 7
top-level functions, while -contify new resulted in a program with 9
top-level functions. This seems surprising to me -- the extra
contification seems to have inhibited some other simplification from
removing functions. Whatever else went on, the running times were hurt by
-contify new.
Finally, with for a self-compile with -contify new, I get:
functions: 9179 call_cont_new: 3116 call_cont: 254 call_new: 793
cont_new: 364 call: 1 cont: 0 new: 265
functions: 3671 call_cont_new: 116 call_cont: 0 call_new: 13 cont_new:
7 call: 0 cont: 0 new: 3
functions: 971 call_cont_new: 1 call_cont: 0 call_new: 1 cont_new: 0
call: 0 cont: 0 new: 2
What's strange here is that the call and continuation based analyses found
254 functions contifiable that the new analyis missed (?); also, one
function found by the call based analysis only. I'm rerunning the
self-compile with diagnostics to count the number of functions found
uncalled by the cont and new analyses. Hopefully, that will make up the
discrepency. I noticed that the existing continuation based analysis
would determine that f has continuation j even if f was only called in a
nontail call (g, f, j) with g uncalled; examples like that would be
considered contifiable by call and cont, but would be determined uncalled
by new.