Representational CPS and TAL

Mon, 7 Aug 2000 13:02:22 -0400 (EDT)

Last week, Steve and I talked with Neal Glew about the possibility of
carrying types in MLton beyond the CPS IL.  Doing so would have a
number of benefits, including the potential to output Typed Assembly
Language (TAL).  Here are some notes from that discussion.

The first step appears to be the creation of a Representational-CPS
(RCPS) IL.  The RCPS IL would make representation of datatypes
explicit, much as they are in the current Machine-IL.  However,
calling-conventions and stack-layout could still be left abstract.

Looking at the big picture, two main issues seem to stand out.  First,
dealing with arrays may be problematic.  The issue is related to the
fact that the basis-library is written in such a way that
array-allocation and array initialization can potentially be separated
by a limit-check (and probably are, when the initialization is a
loop).  This requires the primitive array-allocation to initialize the
elements of an array of pointers with a non-zero mod 4 value.
Clearly, this is inconsistent with array-allocation of an array of a
type without small-integer variants.  The initial conclusion was that
array-allocation will need to be a primitive operation (at the TAL
level and at the RCPS level).

Second, interfacing with the existing GC may also be problematic.  We
discussed looking into other GC's (e.g., Fred Smith's mostly-copying
GC), which could make use of having explicit types at lower levels of
the compiler.  In addition, it might be possible to do compile-time
generation of a GC, again making use of explicit types at the lower
levels.

Interfacing with the remainder of the run-time system does not appear
to have any serious issues.

A first pass at writing down a type system for RCPS yielded the following:

(* Kinds *)
K ::= T  (* type *) 
    | M  (* memory *)

(* Memory access *)
p ::= r  (* read only *)
    | w  (* write only *)
    | rw (* read and write *)

(* Types for datatype representations *)
tau ::= int | ... : T 
      (* primitive types *)
      | t : T         
      (* datatypes *)
      | ^[i_1,...,i_n][tau : M] : T
      (* "pointer type"
       * value is either an integer in [i_1,...,i_n] 
       *  or a pointer to tau
       *)
      | *(tau_1 : M,...,tau_n : M) : M
      (* "product type"
       * value is a tuple of values of types tau_1,...,tau_n
       *)
      | +[(i_1, tau_1 : M),...,(i_n, tau_n : M)] : M
      (* "sum type"
       * value is an integer tag followed by carried value
       *)
      | (tau : T)p : M
      (* cast tau as a read/write-able memory location *)
      | array(tau : M) : M
      (* primitive array of type tau *)

This system appears to capture all of the representations in the
current system (and is a little more general).  The translation from
CPS-types to RCPS-types goes along the following lines:

T{} : CPS-types (phi) -> RCPS-types (tau)

T{int} = int
T{t}   = t
T{phi_1 x ... x phi_n} = ^[][*[(T{phi_1})r,...,(T{phi_n})r]]
T{phi array} = array((T{phi})rw)
T{phi vector} = array((T{phi})r)
T{phi ref} = ^[]*[(T{phi})rw]

Sum types would arise in the translation of the CPS switch transfer.

Following this discussion, and in light of the work Steve and I had
been engaged in regarding liveness analysis, we discussed the
possibility of integrating some sort of liveness information into the
(R)CPS IL in a manner that would allow type-checking/infering to
verify liveness information and insert the correct refinements between
blocks.

We outlined the following "plan of attack," from short-term to
long-term goals:

Add in a translation to and from RCPS in the existing MLton
framework (CPS -> RCPS -> Machine).  Essentially, this would break the
single CPS -> Machine translation into two steps, one for the
representation and one for the remainder.  But, it would let us type
check the representation decisions and flush out any problems with the
RCPS IL.  It also preserves the existing back-ends without modification.