[MLton-user] Re: RFC: Extended Basis Library

Thu Mar 1 07:30:20 PST 2007

Quoting Geoffrey Alan Washburn <geoffw at cis.upenn.edu>:
> Vesa Karvonen wrote:
[...]
> > So, one could use a name like
> >
> >   sortByCmp
> >
> > for the generally fastest comparison based sorting function and
> >
> >   stableSortByCmp
> >
> > for the sorting function that is as fast as possible while being stable
> > (O(n log n) and preserves relative order of equal elements).
> 
>     Given these two specifications, which algorithms do you think you 
> would select?

For sorting lists, I would use (an optimized) merge sort in stableSort.
For the non-stable sort there may be faster alternatives, but as a first
approximation I would probably also use merge sort.  If there is a lot of
free memory and the list is long, then copying to an array and sorting it
with intro sort (variation of quick sort) might be significantly faster.

Note that the misc-util library in mltonlib already has a merge sort implementation:

  http://mlton.org/cgi-bin/viewsvn.cgi/mltonlib/trunk/com/ssh/misc-util/unstable/sorted-list.sml?view=auto
  http://mlton.org/cgi-bin/viewsvn.cgi/mltonlib/trunk/com/ssh/misc-util/unstable/sorted-list-test.sml?view=auto

> > The main reason I assumed arbitrary precision is that the Word types
> > already work like unsigned integers with modular arithmetic.
> 
>     Hmm.  I hadn't really thought about using words, but that may be 
> practical enough for my purposes.  Though I imagine someone still might 
> like a natural number abstraction.

I think that it would be *nice to have* natural numbers, but I can't say
that I would have really needed them that often.  If you do need them,
then I think that it would make a fine addition to the Extended Basis.

> >   http://mlton.org/References#WangMurphy .

> [...]  Has anyone tried using something like the Fold library
> (http://mlton.org/Fold) to provide nth projections and injections with a
> single function?

That shouldn't be too hard.  Looking at page 4 of the article, I think
that something like this should suffice (uncompiled):

  fun prjF ? = Fold.fold (prj, id) ?
  fun S ? = Fold.step0 prj_succ ?

Instead of

  case prjn x of
    ...

you could then write
             __n__
            /     \
  case prjF S ... S $ x of
    ...

> > The Monad core signature:
> >
> >   signature MONAD_CORE = sig
> >      type 'a monad
> >      val return : 'a -> 'a monad
> >      val >>= : 'a monad * ('a -> 'b monad) -> 'b monad
> >   end
> >   
>     This is essentially the same as what I had come up with for what I 
> had called BASIC_MONAD. Though I had chosen to go with the signature
> 
>     signature BASIC_MONAD = sig
>        type 'a t
>        val return : 'a -> 'a t
>        val >>= : 'a t * ('a -> 'b t) -> 'b t
>     end
> 
> As using »t« as the type seems to be convention used most other places, 
> but with »where type« constraints I don't know that it really matters 
> that much.

I think that the t convention is not appropriate for abstract or concept
signatures that may be extended and implemented by many structures.  As
explained in

  http://mlton.org/pipermail/mlton/2006-December/029469.html

the point of not using t is to make it possible to combine signatures.
Signatures that have a type named t can not be included into a combined
signature:

  signature WONT_WORK =
     sig
        include SIG_WITH_T
        include ANOTHER_SIG_WITH_T
     end

IMO, using type t in a monad signature would be a mistake.

> >   signature MONADP_CORE = sig
> >      include MONAD_CORE
> >      val zero : 'a monad
> >      val plus : 'a monad BinOp.t
> >   end
> >
> > I'm not sure if the names zero and plus are the best.
> 
>     I called them »zero« and ++. 

Hmmm... The symbolic ++ and an infix declaration for ++ might be better.
What about associativity and precedence?  We should probably reconsider
the precedences of all monadic operators.

[...]
>     It might be worth either adding some additional operations to 
> MONADP_CORE, or perhaps make things even more elaborate.  I have three 
> other MONADP operations that cannot (usually) be defined in terms of 
> MONADP_CORE. 
> 
>     val fail : 'a monad
>     val once : 'a monad -> 'a monad
>     val ifte : 'a monad -> ('a -> 'b monad) -> 'b monad -> 'b monad
> 
> [...] (http://www.cs.rutgers.edu/~ccshan/logicprog/LogicT-icfp2005.pdf) [...]

In section 4.1 of the article, the authors specify a new type class called
LogicM that inherits from MonadPlus rather than change the MonadPlus
class.  I think that any extra primitives should go into another
(extended) signature.  The basic MonadPlus class (or MONADP_CORE) is
useful for many things without the extra primitives.  Also, from my quick
reading of the article, I get the impression that once and ifte (and
interleave and >>-) can be implemented in terms of msplit + MonadPlus, so
I'm not sure that having once and ifte in a CORE signature makes sense.

>     Another variation I use is to actually give »zero« and »fail« arguments
> 
>        type error
>        val zero : error -> 'a monad
>        val fail : error -> 'a monad
> 
> and then a »where type« constraint is used to specify the nature of 
> »error« values.  I'm not sure how to fit this in well.

AFAIU, the fail function in Haskell's Monad type class is there mainly for
practical reasons related to the Haskell language and the same reasons do
not apply to SML.  In particular, in SML, you can just raise an exception
if you detect an *error* during a computation.  OTOH, if you want to have
a monad with more control, e.g. over backtracking, than is offered by
MonadPlus (or MONADP_CORE), then you should use an extension of it,
e.g. LogicM (or, say, LOGICM_CORE).  So, I would rather see signatures
LOGICM_CORE, LOGICM_EX and LOGICM rather than have MONADP_CORE changed.

> > In addition to the above, I also have versions of the above signatures
> > with an extra type parameter:
[...]
>     It depends upon your application I expect, but while I used to use 
> an extra parameter for my state monad signature, I've since switched to 
> something that would look like
> 
>     signature STATE_MONAD = sig
>        include MONAD_CORE
>        include MONAD_EX where type 'a monad_ex = 'a monad
>        type state
>        val getState : state monad
>        val setState : state -> unit monad
>        val run : state -> 'a monad -> 'a monad
>     end

Is the type of run correct?

Something like that could also be provided in addition to other monad
signatures.  However, I would factor that into combinable signatures.
Something like this:

  signature MONAD_WS = sig (* WS = WITH_STATE *)
     type 'a monad_ws
     type monad_ws_state
     val getState : monad_ws_state monad_ws
     val setState : monad_ws_state -> Unit.t monad_ws
     val run : monad_ws_state -> 'a monad_ws UnOp.t
  end

  signature MONAD_STATE = sig
     include MONAD
     include MONAD_WS where type 'a monad_ws = 'a monad
  end

Doing it like this means that you can then combine MONAD_WS with
MONADP_CORE

  signature MONADP_STATE = sig
     include MONADP
     include MONAD_WS where type 'a monad_ws = 'a monad
  end

and LOGICM

  signature LOGICM_STATE = sig
     include LOGICM
     include MONAD_WS where type 'a monad_ws = 'a monad
  end

> > Well, I think that there are too few SML libraries.  I think that lazy
> > lists are something that people use often enough that it makes sense to
> > provide a reasonable implementation in the Extended Basis library.
> >
>     Sounds reasonable.  Given that it is probably worth considering some 
> additional container signatures.  Sets and maps at the very least.

I agree.

-Vesa Karvonen