cps & contification

Fri, 19 Jan 2001 11:49:26 -0500 (EST)

> Claim: there does not exist a path of calls from fm to f
> 	==> A'(f) = Uncalled
> Proof:
> Define A'' (f) = Unknown	if there exists a path of calls from fm to f
> 	         Uncalled	otherwise
> Observe that C (A'') = A''.
> Hence A' <= A'' (since A' is lfp).
> Hence, if there does not exist a path of calls from fm to f then
> 	A'(f) <= A''(f) = Uncalled

One minor modification: C(A'') <= A'',
because a function g which only appears once as a nontail callee will have
C(A'')(g) = j <= A''(g) = Unknown.
Result still holds because A'' is a prefixed point (rather than a fixed
point).

> Matthew, are you interested in doing the analysis implementation?  It shouldn't
> be too bad given that the dominators algorithm is already there.  If you do, it
> might be nice for you to catch up with the latest snapshot, which contains a new
> CPS IL, in which many of the lists have been changed to vectors.

The implementation was fairly easy.  The distribution of functions caught
by each analysis is pretty much the same as I showed earlier for my broken
analysis, so I won't waste space with them.  Suffice to say that there are
one or two more functions contifiable in most benchmarks, and in a few
there are a significant number.

Now, there were some anomalies:

testing logic
mlton -contify both
functions: 72  call_cont_new: 8  call_cont: 0  call_new: 7  cont_new: 0
call: 0  cont: 0  new: 47
functions: 57  call_cont_new: 0  call_cont: 0  call_new: 0  cont_new: 0
call: 0  cont: 0  new: 47
functions: 31  call_cont_new: 0  call_cont: 0  call_new: 0  cont_new: 0
call: 0  cont: 0  new: 22
mlton -contify new
functions: 72  call_cont_new: 8  call_cont: 0  call_new: 7  cont_new: 0
call: 0  cont: 0  new: 47
functions: 59  call_cont_new: 0  call_cont: 0  call_new: 2  cont_new: 0
call: 0  cont: 0  new: 47
functions: 31  call_cont_new: 0  call_cont: 0  call_new: 0  cont_new: 0
call: 0  cont: 0  new: 22

What's interesting here is that in -contify new, none of the functions
determined contifiable by the new analysis were actually contified.  I
investigated this a little bit, and it seems that we've finally run up
against a case where the mutual recursion at the top-level can't be
mimiced by nesting.

testing zern
mlton -contify both
functions: 48  call_cont_new: 33  call_cont: 0  call_new: 0  cont_new: 2
call: 0  cont: 0  new: 1
functions: 13  call_cont_new: 1  call_cont: 0  call_new: 0  cont_new: 0
call: 0  cont: 0  new: 1
functions: 2  call_cont_new: 0  call_cont: 0  call_new: 0  cont_new: 0
call: 0  cont: 0  new: 0
	0:07.96 real,	7.96 user,	0.00 sys
	0:07.94 real,	7.94 user,	0.00 sys
	0:07.95 real,	7.95 user,	0.00 sys
mlton -contify new
functions: 48  call_cont_new: 33  call_cont: 0  call_new: 0  cont_new: 2
call: 0  cont: 0  new: 1
functions: 12  call_cont_new: 1  call_cont: 0  call_new: 0  cont_new: 0
call: 0  cont: 0  new: 0
functions: 1  call_cont_new: 0  call_cont: 0  call_new: 0  cont_new: 0
call: 0  cont: 0  new: 0
	0:07.08 real,	7.04 user,	0.03 sys
	0:07.12 real,	7.11 user,	0.00 sys
	0:07.08 real,	7.09 user,	0.00 sys

Times are for the runtime of zern.  This might explain the big speed up
with the new analysis for zern: although there is only one extra function
determined contifiable, it changes the final cps IL from a program with
two top-level functions to a program with one top-level function.  This
seems to suggest that there is a penalty for transfers between top-level
functions and/or chunks.  I guess that would be expected.  Also, recall
that the new function contifiable is just the function contified into
concat_0, which appears in many of the benchmarks -- so it doesn't seem
likely that any other cps related optimizations were done on zern that
weren't done on other benchmarks.

testing barnes-hut
mlton -contify none
	0:31.35 real,	31.01 user,	0.35 sys
   text	   data	    bss	    dec	    hex	filename
  62941	   2528	   2172	  67641	  10839	barnes-hut
	0:06.29 real,	6.14 user,	0.15 sys
	0:06.31 real,	6.14 user,	0.17 sys
	0:06.28 real,	6.14 user,	0.14 sys
mlton -contify both
functions: 117  call_cont_new: 61  call_cont: 0  call_new: 5  cont_new: 2
call: 0  cont: 0  new: 3
functions: 47  call_cont_new: 5  call_cont: 0  call_new: 0  cont_new: 0
call: 0  cont: 0  new: 3
functions: 7  call_cont_new: 0  call_cont: 0  call_new: 0  cont_new: 0
call: 0  cont: 0  new: 0
	0:04.20 real,	4.00 user,	0.20 sys
	0:04.22 real,	3.93 user,	0.29 sys
	0:04.21 real,	4.04 user,	0.17 sys
mlton -contify new
functions: 117  call_cont_new: 61  call_cont: 0  call_new: 5  cont_new: 2
call: 0  cont: 0  new: 3
functions: 44  call_cont_new: 5  call_cont: 0  call_new: 0  cont_new: 0
call: 0  cont: 0  new: 0
functions: 9  call_cont_new: 0  call_cont: 0  call_new: 0  cont_new: 0
call: 0  cont: 0  new: 0
	0:04.97 real,	4.76 user,	0.22 sys
	0:05.00 real,	4.88 user,	0.13 sys
	0:04.97 real,	4.74 user,	0.24 sys

With barnes-hut, note that -contify both resulted in a program with 7
top-level functions, while -contify new resulted in a program with 9
top-level functions.  This seems surprising to me -- the extra
contification seems to have inhibited some other simplification from
removing functions.  Whatever else went on, the running times were hurt by
-contify new.

Finally, with for a self-compile with -contify new, I get:

functions: 9179  call_cont_new: 3116  call_cont: 254  call_new: 793
cont_new: 364  call: 1  cont: 0  new: 265
functions: 3671  call_cont_new: 116  call_cont: 0  call_new: 13  cont_new:
7  call: 0  cont: 0  new: 3
functions: 971  call_cont_new: 1  call_cont: 0  call_new: 1  cont_new: 0
call: 0  cont: 0  new: 2

What's strange here is that the call and continuation based analyses found
254 functions contifiable that the new analyis missed (?); also, one
function found by the call based analysis only.  I'm rerunning the
self-compile with diagnostics to count the number of functions found
uncalled by the cont and new analyses.  Hopefully, that will make up the
discrepency.  I noticed that the existing continuation based analysis
would determine that f has continuation j even if f was only called in a
nontail call (g, f, j) with g uncalled; examples like that would be
considered contifiable by call and cont, but would be determined uncalled
by new.