MLton 20241230

Standard ML does not support ad hoc polymorphism. This presents a challenge to programmers. The problem is that at first glance there seems to be no practical way to implement something like a function for converting a value of any type to a string or a function for computing a hash value for a value of any type. Fortunately there are ways to implement type-indexed values in SML as discussed in Yang98. Various articles such as Danvy98, Ramsey11, Elsman04, Kennedy04, and Benton05 also contain examples of type-indexed values.

NOTE: The technique used in the following example uses an early (and somewhat broken) variation of the basic technique used in an experimental generic programming library (see README) that can be found from the MLton repository. The generic programming library also includes a more advanced generic pretty printing function (see pretty.sig).

Example: Converting any SML value to (roughly) SML syntax

Consider the problem of converting any SML value to a textual presentation that matches the syntax of SML as closely as possible. One solution is a type-indexed function that maps a given type to a function that maps any value (of the type) to its textual presentation. A type-indexed function like this can be useful for a variety of purposes. For example, one could use it to show debugging information. We’ll call this function show.

We’ll do a fairly complete implementation of show. We do not distinguish infix and nonfix constructors, but that is not an intrinsic property of SML datatypes. We also don’t reconstruct a type name for the value, although it would be particularly useful for functional values. To reconstruct type names, some changes would be needed and the reader is encouraged to consider how to do that. A more realistic implementation would use some pretty printing combinators to compute a layout for the result. This should be a relatively easy change (given a suitable pretty printing library). Cyclic values (through references and arrays) do not have a standard textual presentation and it is impossible to convert arbitrary functional values (within SML) to a meaningful textual presentation. Finally, it would also make sense to show sharing of references and arrays. We’ll leave these improvements to an actual library implementation.

The following code uses the fixpoint framework and other utilities from an Extended Basis library (see README).

Signature

Let’s consider the design of the SHOW signature:

infixr -->

signature SHOW = sig
   type 'a t       (* complete type-index *)
   type 'a s       (* incomplete sum *)
   type ('a, 'k) p (* incomplete product *)
   type u          (* tuple or unlabelled product *)
   type l          (* record or labelled product *)

   val show : 'a t -> 'a -> string

   (* user-defined types *)
   val inj : ('a -> 'b) -> 'b t -> 'a t

   (* tuples and records *)
   val * : ('a, 'k) p * ('b, 'k) p -> (('a, 'b) product, 'k) p

   val U :           'a t -> ('a, u) p
   val L : string -> 'a t -> ('a, l) p

   val tuple  : ('a, u) p -> 'a t
   val record : ('a, l) p -> 'a t

   (* datatypes *)
   val + : 'a s * 'b s -> (('a, 'b) sum) s

   val C0 : string -> unit s
   val C1 : string -> 'a t -> 'a s

   val data : 'a s -> 'a t

   val Y : 'a t Tie.t

   (* exceptions *)
   val exn : exn t
   val regExn : (exn -> ('a * 'a s) option) -> unit

   (* some built-in type constructors *)
   val refc : 'a t -> 'a ref t
   val array : 'a t -> 'a array t
   val list : 'a t -> 'a list t
   val vector : 'a t -> 'a vector t
   val --> : 'a t * 'b t -> ('a -> 'b) t

   (* some built-in base types *)
   val string : string t
   val unit : unit t
   val bool : bool t
   val char : char t
   val int : int t
   val word : word t
   val real : real t
end

While some details are shaped by the specific requirements of show, there are a number of (design) patterns that translate to other type-indexed values. The former kind of details are mostly shaped by the syntax of SML values that show is designed to produce. To this end, abstract types and phantom types are used to distinguish incomplete record, tuple, and datatype type-indices from each other and from complete type-indices. Also, names of record labels and datatype constructors need to be provided by the user.

Arbitrary user-defined datatypes

Perhaps the most important pattern is how the design supports arbitrary user-defined datatypes. A number of combinators together conspire to provide the functionality. First of all, to support new user-defined types, a combinator taking a conversion function to a previously supported type is provided:

val inj : ('a -> 'b) -> 'b t -> 'a t

An injection function is sufficient in this case, but in the general case, an embedding with injection and projection functions may be needed.

To support products (tuples and records) a product combinator is provided:

val * : ('a, 'k) p * ('b, 'k) p -> (('a, 'b) product, 'k) p

The second (phantom) type variable 'k is there to distinguish between labelled and unlabelled products and the type p distinguishes incomplete products from complete type-indices of type t. Most type-indexed values do not need to make such distinctions.

To support sums (datatypes) a sum combinator is provided:

val + : 'a s * 'b s -> (('a, 'b) sum) s

Again, the purpose of the type s is to distinguish incomplete sums from complete type-indices of type t, which usually isn’t necessary.

Finally, to support recursive datatypes, including sets of mutually recursive datatypes, a fixpoint tier is provided:

val Y : 'a t Tie.t

Together these combinators (with the more domain specific combinators U, L, tuple, record, C0, C1, and data) enable one to encode a type-index for any user-defined datatype.

Exceptions

The exn type in SML is a universal type into which all types can be embedded. SML also allows a program to generate new exception variants at run-time. Thus a mechanism is required to register handlers for particular variants:

val exn : exn t
val regExn : (exn -> ('a * 'a s) option) -> unit

The universal exn type-index then makes use of the registered handlers. The above particular form of handler, which converts an exception value to a value of some type and a type-index for that type (essentially an existential type) is designed to make it convenient to write handlers. To write a handler, one can conveniently reuse existing type-indices:

exception Int of int

local
   open Show
in
   val () = regExn (fn Int v => SOME (v, C1"Int" int)
                     | _     => NONE)
end

Note that a single handler may actually handle an arbitrary number of different exceptions.

Other types

Some built-in and standard types typically require special treatment due to their special nature. The most important of these are arrays and references, because cyclic data (ignoring closures) and observable sharing can only be constructed through them.

When arrow types are really supported, unlike in this case, they usually need special treatment due to the contravariance of arguments.

Lists and vectors require special treatment in the case of show, because of their special syntax. This isn’t usually the case.

The set of base types to support also needs to be considered unless one exports an interface for constructing type-indices for entirely new base types.

Usage

Before going to the implementation, let’s look at some examples. For the following examples, we’ll assume a structure binding Show :> SHOW. If you want to try the examples immediately, just skip forward to the implementation.

To use show, one first needs a type-index, which is then given to show. To show a list of integers, one would use the type-index list int, which has the type int list Show.t:

val "[3, 1, 4]" =
    let open Show in show (list int) end
       [3, 1, 4]

Likewise, to show a list of lists of characters, one would use the type-index list (list char), which has the type char list list Show.t:

val "[[#\"a\", #\"b\", #\"c\"], []]" =
    let open Show in show (list (list char)) end
       [[#"a", #"b", #"c"], []]

Handling standard types is not particularly interesting. It is more interesting to see how user-defined types can be handled. Although the option datatype is a standard type, it requires no special support, so we can treat it as a user-defined type. Options can be encoded easily using a sum:

fun option t = let
   open Show
in
   inj (fn NONE => INL ()
         | SOME v => INR v)
       (data (C0"NONE" + C1"SOME" t))
end

val "SOME 5" =
    let open Show in show (option int) end
       (SOME 5)

Readers new to type-indexed values might want to type annotate each subexpression of the above example as an exercise. (Use a compiler to check your annotations.)

Using a product, user specified records can be also be encoded easily:

val abc = let
   open Show
in
   inj (fn {a, b, c} => a & b & c)
       (record (L"a" (option int) *
                L"b" real *
                L"c" bool))
end

val "{a = SOME 1, b = 3.0, c = false}"
    let open Show in show abc end
       {a = SOME 1, b = 3.0, c = false}

As you can see, both of the above use inj to inject user-defined types to the general purpose sum and product types.

Of particular interest is whether recursive datatypes and cyclic data can be handled. For example, how does one write a type-index for a recursive datatype such as a cyclic graph?

datatype 'a graph = VTX of 'a * 'a graph list ref
fun arcs (VTX (_, r)) = r

Using the Show combinators, we could first write a new type-index combinator for graph:

fun graph a = let
   open Tie Show
in
   fix Y (fn graph_a =>
             inj (fn VTX (x, y) => x & y)
                 (data (C1"VTX"
                          (tuple (U a *
                                  U (refc (list graph_a)))))))
end

To show a graph with integer labels

val a_graph = let
   val a = VTX (1, ref [])
   val b = VTX (2, ref [])
   val c = VTX (3, ref [])
   val d = VTX (4, ref [])
   val e = VTX (5, ref [])
   val f = VTX (6, ref [])
in
   arcs a := [b, d]
 ; arcs b := [c, e]
 ; arcs c := [a, f]
 ; arcs d := [f]
 ; arcs e := [d]
 ; arcs f := [e]
 ; a
end

we could then simply write

val "VTX (1, ref [VTX (2, ref [VTX (3, ref [VTX (1, %0), \
    \VTX (6, ref [VTX (5, ref [VTX (4, ref [VTX (6, %3)])])] as %3)]), \
    \VTX (5, ref [VTX (4, ref [VTX (6, ref [VTX (5, %2)])])] as %2)]), \
    \VTX (4, ref [VTX (6, ref [VTX (5, ref [VTX (4, %1)])])] as %1)] as %0)" =
    let open Show in show (graph int) end
       a_graph

There is a subtle gotcha with cyclic data. Consider the following code:

exception ExnArray of exn array

val () = let
   open Show
in
   regExn (fn ExnArray a =>
              SOME (a, C1"ExnArray" (array exn))
            | _ => NONE)
end

val a_cycle = let
   val a = Array.fromList [Empty]
in
   Array.update (a, 0, ExnArray a) ; a
end

Although the above looks innocent enough, the evaluation of

val "[|ExnArray %0|] as %0" =
    let open Show in show (array exn) end
       a_cycle

goes into an infinite loop. To avoid this problem, the type-index array exn must be evaluated only once, as in the following:

val array_exn = let open Show in array exn end

exception ExnArray of exn array

val () = let
   open Show
in
   regExn (fn ExnArray a =>
              SOME (a, C1"ExnArray" array_exn)
            | _ => NONE)
end

val a_cycle = let
   val a = Array.fromList [Empty]
in
   Array.update (a, 0, ExnArray a) ; a
end

val "[|ExnArray %0|] as %0" =
    let open Show in show array_exn end
       a_cycle

Cyclic data (excluding closures) in Standard ML can only be constructed imperatively through arrays and references (combined with exceptions or recursive datatypes). Before recursing to a reference or an array, one needs to check whether that reference or array has already been seen before. When ref or array is called with a type-index, a new cyclicity checker is instantiated.

Implementation

structure SmlSyntax = struct
   local
      structure CV = CharVector and C = Char
   in
      val isSym = Char.contains "!%&$#+-/:<=>?@\\~`^|*"

      fun isSymId s = 0 < size s andalso CV.all isSym s

      fun isAlphaNumId s =
          0 < size s
          andalso C.isAlpha (CV.sub (s, 0))
          andalso CV.all (fn c => C.isAlphaNum c
                                  orelse #"'" = c
                                  orelse #"_" = c) s

      fun isNumLabel s =
          0 < size s
          andalso #"0" <> CV.sub (s, 0)
          andalso CV.all C.isDigit s

      fun isId s = isAlphaNumId s orelse isSymId s

      fun isLongId s = List.all isId (String.fields (#"." <\ op =) s)

      fun isLabel s = isId s orelse isNumLabel s
   end
end

structure Show :> SHOW = struct
   datatype 'a t = IN of exn list * 'a -> bool * string
   type 'a s = 'a t
   type ('a, 'k) p = 'a t
   type u = unit
   type l = unit

   fun show (IN t) x = #2 (t ([], x))

   (* user-defined types *)
   fun inj inj (IN b) = IN (b o Pair.map (id, inj))

   local
      fun surround pre suf (_, s) = (false, concat [pre, s, suf])
      fun parenthesize x = if #1 x then surround "(" ")" x else x
      fun construct tag =
          (fn (_, s) => (true, concat [tag, " ", s])) o parenthesize
      fun check p m s = if p s then () else raise Fail (m^s)
   in
      (* tuples and records *)
      fun (IN l) * (IN r) =
          IN (fn (rs, a & b) =>
                 (false, concat [#2 (l (rs, a)),
                                 ", ",
                                 #2 (r (rs, b))]))

      val U = id
      fun L l = (check SmlSyntax.isLabel "Invalid label: " l
               ; fn IN t => IN (surround (l^" = ") "" o t))

      fun tuple (IN t) = IN (surround "(" ")" o t)
      fun record (IN t) = IN (surround "{" "}" o t)

      (* datatypes *)
      fun (IN l) + (IN r) = IN (fn (rs, INL a) => l (rs, a)
                                 | (rs, INR b) => r (rs, b))

      fun C0 c = (check SmlSyntax.isId "Invalid constructor: " c
                ; IN (const (false, c)))
      fun C1 c (IN t) = (check SmlSyntax.isId "Invalid constructor: " c
                       ; IN (construct c o t))

      val data = id

      fun Y ? = Tie.iso Tie.function (fn IN x => x, IN) ?

      (* exceptions *)
      local
         val handlers = ref ([] : (exn -> unit t option) list)
      in
         val exn = IN (fn (rs, e) => let
                             fun lp [] =
                                 C0(concat ["<exn:",
                                            General.exnName e,
                                            ">"])
                               | lp (f::fs) =
                                 case f e
                                  of NONE => lp fs
                                   | SOME t => t
                             val IN f = lp (!handlers)
                          in
                             f (rs, ())
                          end)
         fun regExn f =
             handlers := (Option.map
                             (fn (x, IN f) =>
                                 IN (fn (rs, ()) =>
                                        f (rs, x))) o f)
                         :: !handlers
      end

      (* some built-in type constructors *)
      local
         fun cyclic (IN t) = let
            exception E of ''a * bool ref
         in
            IN (fn (rs, v : ''a) => let
                      val idx = Int.toString o length
                      fun lp (E (v', c)::rs) =
                          if v' <> v then lp rs
                          else (c := false ; (false, "%"^idx rs))
                        | lp (_::rs) = lp rs
                        | lp [] = let
                             val c = ref true
                             val r = t (E (v, c)::rs, v)
                          in
                             if !c then r
                             else surround "" (" as %"^idx rs) r
                          end
                   in
                      lp rs
                   end)
         end

         fun aggregate pre suf toList (IN t) =
             IN (surround pre suf o
                 (fn (rs, a) =>
                     (false,
                      String.concatWith
                         ", "
                         (map (#2 o curry t rs)
                              (toList a)))))
      in
         fun refc ? = (cyclic o inj ! o C1"ref") ?
         fun array ? = (cyclic o aggregate "[|" "|]" (Array.foldr op:: [])) ?
         fun list ? = aggregate "[" "]" id ?
         fun vector ? = aggregate "#[" "]" (Vector.foldr op:: []) ?
      end

      fun (IN _) --> (IN _) = IN (const (false, "<fn>"))

      (* some built-in base types *)
      local
         fun mk toS = (fn x => (false, x)) o toS o (fn (_, x) => x)
      in
         val string =
             IN (surround "\"" "\"" o mk (String.translate Char.toString))
         val unit = IN (mk (fn () => "()"))
         val bool = IN (mk Bool.toString)
         val char = IN (surround "#\"" "\"" o mk Char.toString)
         val int = IN (mk Int.toString)
         val word = IN (surround "0wx" "" o mk Word.toString)
         val real = IN (mk Real.toString)
      end
   end
end

(* Handlers for standard top-level exceptions *)
val () = let
   open Show
   fun E0 name = SOME ((), C0 name)
in
   regExn (fn Bind => E0"Bind"
            | Chr => E0"Chr"
            | Div => E0"Div"
            | Domain => E0"Domain"
            | Empty => E0"Empty"
            | Match => E0"Match"
            | Option => E0"Option"
            | Overflow  => E0"Overflow"
            | Size => E0"Size"
            | Span => E0"Span"
            | Subscript => E0"Subscript"
            | _ => NONE)
 ; regExn (fn Fail s => SOME (s, C1"Fail" string)
            | _ => NONE)
end

Also see

There are a number of related techniques. Here are some of them.