[MLton-user] ascription in MLton source

Sat Feb 23 01:33:28 PST 2008

On Sat, 23 Feb 2008, Sean McLaughlin wrote:
> I've been browsing the MLton source, and am just curious about something:
> why is there almost no opaque ascription?  It seems the only time it's used
> is in substructures where you define the signature inline.
>  I learned to always prefer opaque ascription
> except in certain circumstances where having to declare lots of type
> equalities
> makes it unwieldy.  Is there a totally different philosophy
> in the MLton architecture?  Is there
> something particular about the compiler that required, or was made
> simpler/cleaner
> using transparent ascription?

Stephen Weeks wrote nearly all of the initial compiler code, and established
various programming policies.  He might have some additional thoughts, but I
think the short answer is that various other aspects of the programming
style used in MLton obviate the need for opaque ascription.

First, we make extensive use of closed signatures and functors.  (See
<src>/doc/style-guide/.)  Nearly every module is implemented as a functor 
of the form:
   signature CLOSURE_CONVERT_STRUCTS =
     sig
       structure Sxml: SXML
       structure Ssa: SSA
       sharing Sxml.Atoms = Ssa.Atoms
     end
   signature CLOSURE_CONVERT =
     sig
       include CLOSURE_CONVERT_STRUCTS
       val closureConvert: Sxml.Program.t -> Ssa.Program.t
     end
   functor ClosureConvert(S: CLOSURE_CONVERT_STRUCTS): CLOSURE_CONVERT =
     struct
       ...
     end
So, the closure conversion module/function can't exploit any data
implementation details of the Sxml or Ssa modules other than those 
revealed by the SXML and SSA signatures.  Furthermore, ClosureConvert 
doesn't define/export any new types, so there is no need for an opaque 
ascription on the functor result.  In fact, if we were to do so, we would 
actually need a number of sharing constraints to relate the Sxml and Ssa 
structures in the result CLOSURE_CONVERT to the Sxml and Ssa structures in 
the source CLOSURE_CONVERT_STRUCTS.

Even for functors that define types (e.g., 
<src>/mlton/ssa/ssa-tree.{sig,fun}), there is no need for an opaque 
ascription, as all the clients of the types are themselves functors that 
are parameterized by 'structure Ssa: SSA' -- so any implementation details 
that would be revealed by the transparent signature ascription can't be 
exploited by the clients.  It is only at the top level 
(<src>/mlton/main/compile.fun) all the IL functors are instantiated and 
all of the compiler pass functors are instantiated.  While it would be 
possible to exploit leaked implmenentation details within compile.fun, it 
is effectively just the linker, and just dispatches to functions from 
other (now instantited) modules.

Second, nearly every declared type is defined as a datatype, which is 
generative and creates a new type.  If the datatype constructor(s) are not 
revealed by the signature, then the type is effectively opaque. 
Consider:
   structure S1 : sig type t end = struct datatype t = T of int end
   structure S2 :> sig type t end = struct type t = int end
Although S1 uses a transparent signature ascription, no client of S1 can 
exploit the data implementation details -- there is no way to create or 
destruct values of type S1.t without the S1.T data constructor.

Taken together, these programming conventions ensure the same data
abstractions that are ensured by opaque ascription.

Indeed, the latter convention (using a datatype without exporting 
constructors) would seem to cover the few places where we do use an opaque 
ascription on a sub-structure with an inline signature.  And it also 
covers the standard motivating examples of opaque ascription:
   signature QUEUE =
     sig
       type 'a queue
       exception Empty
       val empty : 'a queue
       val insert : 'a * 'a queue -> 'a queue
       val remove : 'a queue -> 'a * 'a queue
     end
   structure Queue2 :> QUEUE =
     struct
       type 'a queue = 'a list * 'a list
       val empty = (nil, nil)
       fun insert (x, (bs, fs)) = (x::bs, fs)
       exception Empty
       fun remove (nil, nil) = raise Empty
         | remove (bs, f::fs) = (f, (bs, fs))
         | remove (bs, nil) = remove (nil, rev bs)
     end
   structure Queue1 : QUEUE =
     struct
       datatype 'a queue = T of 'a list * 'a list
       val empty = T (nil, nil)
       fun insert (x, T (bs, fs)) = T (x::bs, fs)
       exception Empty
       fun remove (T (nil, nil)) = raise Empty
         | remove (T (bs, f::fs)) = (f, T (bs, fs))
         | remove (T (bs, nil)) = remove (T (nil, rev bs))
     end
The same (practical) data abstraction is enforced.  There is a subtle 
technical difference in the types of Queue2.queue and Queue1.queue, 
but not one that can be exploited.

Now, there might be compilation reasons why you would prefer opaque 
ascription to transparent ascription.  For example, with a cutoff 
recompilation system, an opaque ascription is a stronger cutoff point than 
a transparent ascription.  If I were to change Queue1 to
   structure Queue1 : QUEUE =
     struct
       datatype 'a queue = T of {back: 'a list, front: 'a list}
       val empty = T {back = nil, front = nil}
       fun insert (x, T {back=bs,front=fs}) = T {back =x::bs, front = fs}
       exception Empty
       fun remove (T {back=nil, front=nil}) = raise Empty
         | remove (T {back=bs, front=f::fs}) = (f, T {back=bs, front=fs})
         | remove (T {back=bs, front=nil}) = remove (T (back=nil, front=rev bs})
     end
then (technically) Queue1.queue has a new type identity, which means that 
the augmented signature of Queue1 is different, which means that it may be 
necessary to recompile the clients of Queue1 -- despite the fact that they 
can't actually observe the change in representation.
If I were to change Queue2 in the same manner, then the opaque ascription 
means that the (non-augmented) signature of Queue2 is exactly the same, 
and it is not necessary to recompile the clients of Queue2.  (Of course, 
if you expect your compiler to perform any cross-module inlining, then it 
still needs to recompile clients, in order to propagate the code changes.)

Similarly, it may be the case that a compiler represents
   datatype t = T of int
with an extra level of indirection, which may be deemed a performance 
penalty.  So, one would avoid using datatypes, and instead use opaque 
ascription to enforce the data abstraction.

Of course, in a whole-program compiler like MLton, neither of these issues 
are of concern.