[MLton] printf via fold

Tue, 30 Aug 2005 20:26:45 -0700

I've been thinking about how to build a reusable library that makes
implementing printf-like functions easier, as well as improving our
current implementation in ways that I earlier mentioned: avoiding
unnecessary list construction and conveniently sharing the format
specifiers and end-of-arguments symbol with other similar functions
like format and fprintf.  I came up with a reconstruction of printf in
terms of simple fold libraries.  Effectively, printf is built by a
fold right that generates a fold left.

One can also use the techniques in this note to build a nicer version
of functional record update and optional arguments requiring only a
single global "setter" instead of the per-function or
per-optional-argument approaches we discussed earlier.  I'll save that
for a separate note.

Here's the reconstruction.

Start with a generic variable-argument fold (left) library.

----------------------------------------------------------------------
fun $ (a, f) = f a

structure Fold =
   struct
      fun fold u g = g u
      fun step0 h (a1, f) = fold (h a1, f)
      fun step1 h u z = step0 (fn a1 => h (z, a1)) u
   end
----------------------------------------------------------------------

I defined "$" at the top level because it will be useful as the
end-of-arguments terminator in all situations, not just fold left.  By
plugging in the definitions above, one can see that the Fold library
satisfies the following equations.

  fold (a, f) (step0 h1) (step0 h2) ... (step0 hn) $
  === f (hn (... (h2 (h1 a))))

  fold (a, f) (step1 h1) b1 (step1 h2) b2 ... (step1 hn) bn $
  === f (hn (bn, ... (h2 (b2, h1 (b1, a)))))

Of course, step0 and step1 can me intermixed; the generalized equation
is clear.

So, fold/step is similar to a usual left fold with a base and a
folder, except that the folder is not supplied at the beginning.
Rather, the base (a) is supplied at the beginning and there is a
(potentially) different folder supplied at each step.  Furthermore,
the steps are responsible for continuing the fold.  Finally, there is
an additional aspect to the fold -- a finishing function (f) that is
applied to the final result of the fold.  With a usual fold this would
be superfluous, since one could simply call the finisher on the result
of the fold; but in this case, since fold takes a variable number
arguments, that isn't possible, so it is essential to build the
finisher into the process.

Now, for some examples.  I'll use utility functions to print integers
and lists of integers.

fun pi' i = print (Int.toString i)
fun pi i = (pi' i; print "\n")
fun pis is = 
   (print "["
    ; (case is of 
          [] => () 
        | i :: is => (pi' i; List.app (fn i => (print ", "; pi' i)) is))
    ; print "]\n")

In the simplest case, we can use the same folder at each step to count
the number of arguments passed, and then use the finisher to print
out the number at the very end.

val C = fn z => Fold.step0 (fn i => i + 1) z
val f = fn z => Fold.fold (0, pi) z
val () = f $
val () = f C C $
val () = f C C C C C $

This prints "0", "2", then "5".  The eta wrappers used to define C and
f are essential since each use is at a different type.

We can fold :: in the usual way to reverse a list.

val C = fn z => Fold.step1 (op ::) z
val f = fn z => Fold.fold ([], pis) z 
val () = f C 1 $
val () = f C 1 C 2 C 3 $

This prints "1" and then "[3, 2, 1]".

Working toward printf-like uses, we can construct a bigger and bigger
curried function as we fold, and then supply that function the correct
number of arguments.

val C = fn z => Fold.step0 (fn f => fn () => f) z
val f = fn z => Fold.fold ((), fn f => f) z
val () = f C $ ()
val () = f C C C $ () () ()

We can even define variable-argument steppers.

val L = fn z => Fold.fold ([], rev) z
fun X u = Fold.fold ([], Fold.step1 (op ::) u o rev)
val Y = fn z => Fold.step1 (op ::) z
val () = List.app pis (L X Y 1 Y 2 Y 3 $ 
                         X Y 4 Y 5 $ 
                         X Y 6 $ 
                       $)

Here, L is a fold that builds up a list, while X is a
variable-argument stepper that also builds up a list.  The example
prints

  [1, 2, 3]
  [4, 5]
  [6]

OK, enough fold examples.  Let's move on to fold right, which can be
implemented by folding left and building up a function that is
applied at the end by the finisher.

----------------------------------------------------------------------
structure Foldr =
   struct
      fun foldr (c, f) = Fold.fold (f, fn g => g c)
      fun step0 h = Fold.step0 (fn g => g o h)
      fun step1 h = Fold.step1 (fn (z, g) => g o (fn a1 => h (z, a1)))
   end
----------------------------------------------------------------------

By plugging in the definitions and using the fold equations, one can
see that the Foldr library satisfies the following equations, which
are like the fold-left equations but with the steppers reversed.

  foldr (a, f) (step0 h1) (step0 h2) ... (step0 hn) $
  === f (h1 (h2 (... (hn a))))

  foldr (a, f) (step1 h1) b1 (step1 h2) b2 ... (step1 hn) bn $
  === f (h1 (b1, h2 (b2, ... hn (bn, a))))

As with Fold, step0 and step1 can be mixed.  Also as with fold, we
supply a finisher function to be applied to the result of the fold.
Pre-supplying a finisher is necessary here for the same reason as with
fold: with a variable number of arguments, there is no way to post
compose finisher by hand.  Nicely, the same end-of-argument
terminator ($) that we used before still works.  This will be true
throughout this note.  It's probably worth exposing $ at the top-level
in the basis that exports all this stuff.

The examples for foldr are much the same as with fold.  To convince
yourself that you're right folding, you can do the list example.

  val C = fn z => Foldr.step1 (op ::) z
  val f = fn z => Foldr.foldr ([], pis) z 
  val () = f C 1 $
  val () = f C 1 C 2 C 3 $

This prints "1" and then "[1, 2, 3]".

Working toward printf, we can now use foldr to build a library that
lets one make (curried) fold-left functions.

----------------------------------------------------------------------
structure MakeFold =
   struct
      fun makeFold (a, g, f) = Foldr.foldr ((g, f), fn (_, f) => f a)
      fun step_0_1 h =
         Foldr.step0 (fn (g, r) => (g, fn a => fn d1 => r (g (h d1, a))))
      fun step_1_0 z =
         Foldr.step1 (fn (b, (g, r)) => (g, fn a => r (g (b, a)))) z
      fun step_1_1 h =
         Foldr.step1 (fn (z, (g, r)) =>
                      (g, fn a => fn d1 => r (g (h (z, d1), a))))
   end
----------------------------------------------------------------------

By plugging in the definitions and using the Foldr equations, one can
see that the MakeFold library satisfies the following equations.

  makeFold (a, g, f) 
           (step_1_0 h1) b1 (step_1_0 h2) b2 ... (step_1_0 hn) bn $ 
  === g (hn bn, ... g (h2 b2, g (h1 b1, a))

  makeFold (a, g, f) 
           (step_0_1 h1) (step_0_1 h2) ... (step_0_1 hn) $ 
           b1 b2 ... bn
  === g (hn bn, ... g (h2 b2, g (h1 b1, a))

  makeFold (a, g, f) 
           (step_1_1 h1) b1 (step_1_1 h2) b2 ... (step_1_1 hn) bn $ 
           c1 c2 ... cn
  === g (hn (bn, cn), ... g (h2 (b2, c2), g (h1 (b1, c1), a)))

Of course, the steppers can be mixed.  And one could define step_i_j
for any i and j, but with tupled arguments, the three steppers should
cover most uses.

The point of MakeFold is that it lets one build (curried) functions
that (left) fold over their arguments, and establish a connection
between each stepper and the corresponding argument.  This is exactly
what's needed for printf.

----------------------------------------------------------------------
structure Printf =
   struct
      fun format z = MakeFold.makeFold ([], op ::, concat o rev) z
      fun fprintf out =
         MakeFold.makeFold
         (fn () => (),
          fn (s, f) => fn () => (f (); TextIO.output (out, s)),
          fn f => f ())
      fun printf z = fprintf TextIO.stdOut z
      val ` = MakeFold.step_1_0
      val newSpec = MakeFold.step_0_1
      fun D z = newSpec Int.toString z
      fun G z = newSpec Real.toString z
   end
----------------------------------------------------------------------

The implementation of Printf uses makeFold in two different ways.
Format, which builds a string, folds over all its arguments,
converting them to strings and accumulating a list of strings,
finishing by reversing the list and concatenating the strings
together.  Fprintf, on the other hand, simply outputs the strings it
gets.

All of our usual printf examples work as expected, with the added
benefit that now we can use fprintf, format, or printf, all with the
same specifiers and with the same end-of-arguments terminator.

   val () = printf`"Hello.\n"$
   val () = printf`"An int "D`" and an int "D`".\n"$ 13 14
   val () = print (format`"An int "D`" and a real "G`".\n"$ 13 3.1415)
   val () = printf`"A real "G`" and a real "G`".\n"$ 13.1 3.1415
   val () = fprintf TextIO.stdErr`"A string"`" - followed by another.\n"$
   val () = printf $
   val () = printf G D`"\n"$ 1.0 13

(You can see from the above that I have been converted to Vesa's style
of format specifiers.  Their orthogonality has won me over.)

An added benefit of this new approach is that printf and fprintf no
longer build an intermediate list.  They directly call print.  And
MLton will almost certainly do exactly the right thing, eliminating
all the intermediate functions, leaving a sequence of calls to
conversions and print.

One could define fprintf in the following simpler way.

      fun fprintf out =
         MakeFold.makeFold ((),
                            fn (s, f) => (f (); TextIO.output (out, s)),
                            fn () => ())

However, I think the first way I gave is better because none of the
printing happens until fprintf is fully applied.  Thus partial
applications will work as expected and it will be easier to avoid
errors where fprintf isn't fully applied (since nothing displays until
it is).  In any case, MLton should still do the right thing.

That's it.  Less than forty lines of code to define printf along with
a few useful libraries for handling variable-number of arguments,
optional arguments, and other generic folders.

All that remains is the signatures.  Sadly, these are not so nice
because of the large number of type variables involved.  I'd love to
hear suggestions for improvements, but I'm not optimistic.  Anyways,
here is the same implementation code as above, augmented with type
definitions and signatures.

----------------------------------------------------------------------

signature FOLD =
   sig
      type ('a, 'b, 'c, 'd) t = 'a * ('b -> 'c) -> 'd

      val fold: 'a * ('b -> 'c) -> ('a, 'b, 'c, 'd) t -> 'd
      val step0: ('a1 -> 'a2) -> ('a1, 'b, 'c, ('a2, 'b, 'c, 'd) t -> 'd) t
      val step1:
         ('z * 'a1 -> 'a2) -> ('a1, 'b, 'c, 'z -> ('a2, 'b, 'c, 'd) t -> 'd) t
   end

structure Fold:> FOLD =
   struct
      type ('a, 'b, 'c, 'd) t = 'a * ('b -> 'c) -> 'd

      fun fold u g = g u
      fun step0 h (a1, f) = fold (h a1, f)
      fun step1 h u z = step0 (fn a1 => h (z, a1)) u
   end

signature FOLDR =
   sig
      type ('a, 'b, 'c, 'd, 'e, 'f) t = ('a -> 'b, 'c -> 'd, 'e, 'f) Fold.t

      val foldr: 'c * ('a -> 'b) -> ('a, 'b, 'c, 'd, 'd, 'e) t -> 'e
      val step0: ('a1 -> 'a2) -> ('a2, 'b, 'c, 'd, 'e,
                                  ('a1, 'b, 'c, 'd, 'e, 'f) t -> 'f) t
      val step1:
         ('z * 'a1 -> 'a2) -> ('a2, 'b, 'c, 'd, 'e,
                               'z -> ('a1, 'b, 'c, 'd, 'e, 'f) t -> 'f) t
   end

structure Foldr:> FOLDR =
   struct
      type ('a, 'b, 'c, 'd, 'e, 'f) t = ('a -> 'b, 'c -> 'd, 'e, 'f) Fold.t

      fun foldr (c, f) = Fold.fold (f, fn g => g c)
      fun step0 h = Fold.step0 (fn g => g o h)
      fun step1 h = Fold.step1 (fn (z, g) => g o (fn a1 => h (z, a1)))
   end

signature MAKE_FOLD =
   sig
      type ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h) t =
         (('b * 'a -> 'a) * ('a -> 'd), 'e,
          ('b * 'a -> 'a) * ('a -> 'c), 'f, 'g, 'h) Foldr.t

      val makeFold:
         'a * ('b * 'a -> 'a) * ('a -> 'c)
         -> ('a, 'b, 'c, 'd, 'd, 'e, 'e, 'f) t -> 'f
      val step_0_1:
         ('d1 -> 'b) -> ('a, 'b, 'c, 'd1 -> 'd2, 'e, 'f, 'g,
                         ('a, 'b, 'c, 'd2, 'e, 'f, 'g, 'h) t -> 'h) t
      val step_1_0:
         ('a, 'b, 'c, 'd, 'e, 'f, 'g,
          'b -> ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h) t -> 'h) t
      val step_1_1:
         ('z * 'd1 -> 'b)
         -> ('a, 'b, 'c, 'd1 -> 'd2, 'e, 'f, 'g,
             'z -> ('a, 'b, 'c, 'd2, 'e, 'f, 'g, 'h) t -> 'h) t
   end

structure MakeFold:> MAKE_FOLD =
   struct
      type ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h) t =
         (('b * 'a -> 'a) * ('a -> 'd), 'e,
          ('b * 'a -> 'a) * ('a -> 'c), 'f, 'g, 'h) Foldr.t

      fun makeFold (a, g, f) = Foldr.foldr ((g, f), fn (_, f) => f a)
      fun step_0_1 h =
         Foldr.step0 (fn (g, r) => (g, fn a => fn d1 => r (g (h d1, a))))
      fun step_1_0 z =
         Foldr.step1 (fn (b, (g, r)) => (g, fn a => r (g (b, a)))) z
      fun step_1_1 h =
         Foldr.step1 (fn (z, (g, r)) =>
                      (g, fn a => fn d1 => r (g (h (z, d1), a))))
   end

signature PRINTF =
   sig
      type ('a, 'b, 'c, 'd, 'e, 'f, 'g) t =
         ('a, string, 'b, 'c, 'd, 'e, 'f, 'g) MakeFold.t
      type ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h) spec =
         ('a, 'b, 'c -> 'd, 'e, 'f, 'g,
          ('a, 'b, 'd, 'e, 'f, 'g, 'h) t -> 'h) t

      val ` : ('a, 'b, 'c, 'd, 'e, 'f,
               string -> ('a, 'b, 'c, 'd, 'e, 'f, 'g) t -> 'g) t
      val D: ('a, 'b, int, 'd, 'e, 'f, 'g, 'h) spec
      val G: ('a, 'b, real, 'd, 'e, 'f, 'g, 'h) spec
      val format: (string list, string, 'c, 'c, 'd, 'd, 'e) t -> 'e
      val fprintf:
         TextIO.outstream -> (unit -> unit, unit, 'c, 'c, 'd, 'd, 'e) t -> 'e
      val newSpec: ('c -> string) -> ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h) spec
      val printf: (unit -> unit, unit, 'c, 'c, 'd, 'd, 'e) t -> 'e
   end

structure Printf:> PRINTF =
   struct
      type ('a, 'b, 'c, 'd, 'e, 'f, 'g) t =
         ('a, string, 'b, 'c, 'd, 'e, 'f, 'g) MakeFold.t

      type ('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h) spec =
         ('a, 'b, 'c -> 'd, 'e, 'f, 'g,
          ('a, 'b, 'd, 'e, 'f, 'g, 'h) t -> 'h) t

      fun format z = MakeFold.makeFold ([], op ::, concat o rev) z
      fun fprintf out =
         MakeFold.makeFold
         (fn () => (),
          fn (s, f) => fn () => (f (); TextIO.output (out, s)),
          fn f => f ())
      fun printf z = fprintf TextIO.stdOut z
      val ` = MakeFold.step_1_0
      val newSpec = MakeFold.step_0_1
      fun D z = newSpec Int.toString z
      fun G z = newSpec Real.toString z
   end