[MLton-user] Pickling typed programs written in HOAS
Vesa Karvonen
vesa.a.j.k at gmail.com
Thu Oct 11 08:16:00 PDT 2007
The desire to send programs (or just closures) over a wire seems to pop up
every now and then. The problem is, of course, that such a thing is not
supported directly in Standard ML. Now, I'm not claiming that the
approach described in this post would be the ultimate solution to the
problem, but it might actually serve some people's needs and some other
people might find this otherwise interesting (or even amusing).
Here is what I'll briefly describe in this post. I'll first describe a
simple toy interpreter for a simple untyped language using a simple
first-order AST. The significance of using a first-order AST is that it
makes it possible to pickle programs. Then I'll show how to implement a
simple HOAS [HOAS] "skin" on top of the first-order AST. The significance
of the HOAS skin is that it gives bindings that are easier to use
correctly as well as static typing. A program written using the HOAS
combinators can then be "reified" to the first-order representation, which
can be pickled. So, you can write your program in SML (well, using
combinators), have it type checked, send it over the wire, and even
evaluate it.
[HOAS] http://en.wikipedia.org/wiki/Higher-order_abstract_syntax
I'll make use of a number of libraries from mltonlib. Here is a SML/NJ
"script" for getting the used library stuff:
val mltonlib = "PATH-TO-MLTONLIB-ROOT" ; (* <-- You'll have to fix this *)
use (mltonlib^"/com/ssh/extended-basis/unstable/public/export/infixes.sml") ;
CM.make (mltonlib^"/com/ssh/extended-basis/unstable/basis.cm") andalso
CM.make (mltonlib^"/com/ssh/generic/unstable/lib.cm") ;
val w = mltonlib^"/com/ssh/generic/unstable/with" ;
use (w^"/generic.sml") ;
use (w^"/type-info.sml") ;
use (w^"/type-hash.sml") ;
use (w^"/hash.sml") ;
use (w^"/eq.sml") ;
use (w^"/some.sml") ;
use (w^"/data-rec-info.sml") ;
use (w^"/pickle.sml") ;
use (w^"/close.sml") ;
use (w^"/extra.sml") ;
open TopLevel ;
open Generic ;
Here is the signature of the simple toy interpreter:
signature TOY_CORE = sig
structure Id : sig
eqtype t
val t : t Rep.t
val new : t Thunk.t
end
structure Lit : sig
datatype t = BOOL of Bool.t
| INT of IntInf.t
val t : t Rep.t
end
structure Val : sig
datatype t = BOOL of Bool.t
| FUN of t UnOp.t
| INT of IntInf.t
end
structure Prim : sig
type t = String.t
val t : t Rep.t
type f = Val.t List.t -> Val.t
val add : (t * f) Effect.t
val find : t -> f
end
structure Term : sig
datatype 't f = APP of 't * 't
| ERR of String.t
| FUN of Id.t * 't
| IF of 't * 't * 't
| IND of 't Ref.t
| LIT of Lit.t
| PAP of Prim.t * 't List.t
| VAR of Id.t
datatype t = IN of t f
val t : t Rep.t
end
structure Env : sig
type t
val empty : t
val add : (Id.t * Val.t) * t -> t
end
val eval : Env.t -> Term.t -> Val.t
end
It is fairly straightforward except for one unusual thing, which is the
IND constructor. We'll use it for fixpoints. One issue worth pointing
out is that the representation of terms has been carefully made
first-order. In particular, primitive functions are stored in a primitive
environment. Another detail worth pointing out is that the signature
specifies type representation constructors (the Rep.t values) for terms.
We'll later use those for pickling terms.
Below is a simple implementation of the toy interpreter:
structure ToyCore :> TOY_CORE = struct
fun assoc i = #2 o valOf o List.find (fn (i', _) => i' = i)
structure Id = struct
type t = Unit.t Ref.t
val t = refc unit
val new = ref
end
structure Lit = struct
datatype t = BOOL of Bool.t
| INT of IntInf.t
val t = iso (data (C1' "BOOL" bool +` C1' "INT" largeInt))
(fn BOOL ? => INL ? | INT ? => INR ?,
fn INL ? => BOOL ? | INR ? => INT ?)
end
structure Val = struct
datatype t = BOOL of Bool.t
| FUN of t UnOp.t
| INT of IntInf.t
end
structure Prim = struct
type t = String.t
val t = string
type f = Val.t List.t -> Val.t
local
val prims : (t * f) List.t Ref.t = ref []
in
val add = List.push prims
fun find x = assoc x (!prims)
end
end
structure Term = struct
datatype 't f = APP of 't * 't
| ERR of String.t
| FUN of Id.t * 't
| IND of 't Ref.t
| IF of 't * 't * 't
| LIT of Lit.t
| PAP of Prim.t * 't List.t
| VAR of Id.t
fun f t =
iso (data (C1' "APP" (sq t)
+` C1' "ERR" string
+` C1' "FUN" (tuple2 (Id.t, t))
+` C1' "IND" (refc t)
+` C1' "IF" (tuple3 (t, t, t))
+` C1' "LIT" Lit.t
+` C1' "PAP" (tuple2 (Prim.t, list t))
+` C1' "VAR" Id.t))
(fn APP ? => INL (INL (INL (INL (INL (INL (INL ?))))))
| ERR ? => INL (INL (INL (INL (INL (INL (INR ?))))))
| FUN ? => INL (INL (INL (INL (INL (INR ?)))))
| IND ? => INL (INL (INL (INL (INR ?))))
| IF ? => INL (INL (INL (INR ?)))
| LIT ? => INL (INL (INR ?))
| PAP ? => INL (INR ?)
| VAR ? => INR ?,
fn INL (INL (INL (INL (INL (INL (INL ?)))))) => APP ?
| INL (INL (INL (INL (INL (INL (INR ?)))))) => ERR ?
| INL (INL (INL (INL (INL (INR ?))))) => FUN ?
| INL (INL (INL (INL (INR ?)))) => IND ?
| INL (INL (INL (INR ?))) => IF ?
| INL (INL (INR ?)) => LIT ?
| INL (INR ?) => PAP ?
| INR ? => VAR ?)
datatype t = IN of t f
val t = Tie.fix Y (fn t =>
iso (data (C1' "IN" (f t)))
(fn IN ? => ?, IN))
end
structure Env = struct
type t = (Id.t * Val.t) List.t
val empty = []
val add = op ::
val find = assoc
end
local
open Term
in
fun eval e (IN t) =
case t
of APP (f, x) => (case eval e f
of Val.FUN f => f (eval e x)
| _ =>
fail "Not a function")
| ERR m => fail m
| FUN (x, b) => Val.FUN (fn v => eval (Env.add ((x, v), e)) b)
| IF (b, c, a) => (case eval e b
of Val.BOOL true => eval e c
| Val.BOOL false => eval e a
| _ => fail "Condition not bool")
| IND r => eval e (!r)
| LIT x => (case x
of Lit.INT i => Val.INT i
| Lit.BOOL b => Val.BOOL b)
| PAP (p, a) => Prim.find p (map (eval e) a)
| VAR x => Env.find x e
end
end
Now, one could write programs as first-order AST terms, but that would be
quite error prone. One would have to make sure that bindings are used
properly and one wouldn't have the benefit of static typing. Let's fix
that by implementing a HOAS skin on top of the interpreter. First the
signature of the HOAS skin:
signature TOY = sig
type 'a t
val F : ('a t -> 'b t) -> ('a -> 'b) t
val \> : ('a -> 'b) t * 'a t -> 'b t
val I : IntInf.t -> IntInf.t t
val B : Bool.t -> Bool.t t
val reify : 'a t -> ToyCore.Term.t
val iff : Bool.t t * 'a t * 'a t -> 'a t
val Y : ('a -> 'b) t Tie.t
val function : String.t
-> ('dom -> 'cod)
-> (ToyCore.Val.t -> 'dom)
-> ('cod -> ToyCore.Val.t)
-> 'dom t -> 'cod t
val binOp : String.t
-> 'a BinOp.t
-> ('a, ToyCore.Val.t) Iso.t
-> 'a t BinOp.t
end
A few words about the signature are perhaps in order. First of all, the
type constructor t is for HOAS terms. The type parameter is a phantom
type variable and serves to make HOAS terms statically typed. The F
combinator is for introducing anonymous functions. The \> combinator is
the function application operator. The I and B combinators are used for
introducing constant integers and booleans to programs. The iff
combinator is for conditionals. Y is for computing fixpoints. You can
use it to define mutually recursive functions. The reify function
converts a HOAS value into a first-order term. The function and binOp
procedures are for registering primitive operations.
Here is an implementation of the HOAS skin:
structure Toy :> TOY = struct
open ToyCore
open Term
type 'a t = Term.t Thunk.t
fun reify t = t ()
fun Y ? = Tie.pure (fn () => let
val r = ref (IN (ERR "fix"))
val t = IN (IND r)
in
(const t,
fn t' => let val t' = reify t' in r := t' ; const t end)
end) ?
fun F f () =
let val i = Id.new () in IN (FUN (i, f (const (IN (VAR i))) ())) end
fun f \> x = const (IN (APP (f (), x ())))
fun I i = const (IN (LIT (Lit.INT i)))
fun B b = const (IN (LIT (Lit.BOOL b)))
fun iff (b, c, a) = const (IN (IF (reify b, reify c, reify a)))
fun function name f from to =
(Prim.add (name, fn [x] => to (f (from x)) | _ => fail "Impossible")
; fn x => const (IN (PAP (name, [reify x]))))
fun binOp name binOp (to, from) =
(Prim.add (name, fn [x, y] => to (binOp (from x, from y))
| _ => fail "Impossible")
; fn (x, y) => const (IN (PAP (name, [reify x, reify y]))))
end
The implementation isn't actually very complicated. An interesting detail
is the conversion of binders to first-order terms in the F combinator.
This is done by allocating a new identifier and then instantiating the
body of the HOAS term with the identifier. Another interesting detail is
the fixpoint witness Y, which makes use of the previously mentioned IND
constructor. Admittedly this is a somewhat unusual technique, but it is easy
to implement and gives what we want.
Now, let's register a few primitives:
structure ToyPrims = struct
local
open ToyCore Toy
val int = (Val.INT, fn Val.INT i => i)
in
val op + = Toy.binOp "+" op + int
val op - = Toy.binOp "-" op - int
val op * = Toy.binOp "*" op * int
val op div = Toy.binOp "div" op div int
val op mod = Toy.binOp "mod" op mod int
end
local
open ToyCore Toy
in
val isZero = Toy.function "isZero" (op = /> 0) (fn Val.INT i =>
i) Val.BOOL
end
end
And then write a simple program:
local
open ToyCore Toy ToyPrims
in
val term = let
val fact =
(Tie.fix Y)
(fn fact =>
F (fn n =>
iff (isZero n,
I 1,
n * (fact \> (n - I 1)))))
in
fact \> I 4
end
end
The above term, when evaluated, computes the factorial of 4.
In order to pickle the above term, we'll need to convert it to the
first-order representation:
val firstOrderTerm = Toy.reify term
Let's then pickle it:
val pickledTerm = pickle ToyCore.Term.t firstOrderTerm
Unpickle it:
val unpickledTerm = unpickle ToyCore.Term.t pickledTerm
And evaluate it:
val result = ToyCore.eval ToyCore.Env.empty unpickledTerm
And the result is:
val result = INT 24 : ToyCore.Val.t
Voilà!
Now, you'll probably find the syntax of the above factorial program ugly.
I agree. However, I think that in some settings it could be improved upon
substantially --- perhaps to an acceptable level. In particular, this
technique would probably make most sense when you have lots of primitives
and your programs are simple --- rarely using new recursive functions.
Notice that the syntax for calling primitives, like isZero, *, and - is
actually quite acceptable. Also, instead of I (and B), we could use
symbolic combinators like ` for literals, which might feel less obtrusive.
-Vesa Karvonen
More information about the MLton-user
mailing list