[MLton] mlb support

Matthew Fluet fluet@cs.cornell.edu
Fri, 25 Jun 2004 21:32:29 -0400 (EDT)


> > The elaborator is very straight-forward.
>
> Good.  I was worried their might be some issues with the fact that
> environments are mutable or missing some scoping capabilities.

Essentially, I just extended the ElaborateEnv with

structure Basis =
   struct
      datatype t = T of {plist: PropertyList.t,
                         bass: (Ast.Basid.t, t) Info.t,
                         fcts: (Ast.Fctid.t, FunctorClosure.t) Info.t,
                         fixs: (Ast.Vid.t, Ast.Fixity.t) Info.t,
                         sigs: (Ast.Sigid.t, Interface.t) Info.t,
                         strs: (Ast.Strid.t, Structure.t) Info.t,
                         types: (Ast.Tycon.t, TypeStr.t) Info.t,
                         vals: (Ast.Vid.t, Vid.t * Scheme.t) Info.t}
      ...
   end
val extendBasid: t * Ast.Basid.t * Basis.t -> unit
val lookupBasid: t * Ast.Basid.t -> Basis.t option
val makeBasis: t * (unit -> 'a) -> 'a * Basis.t
val openBasis: t * Basis.t -> unit

and pattern matched of Structure.t.  The elaboration function looks like:

val elaborateMLB:
  Ast.Basdec.t * {addPrim: Env.t -> Decs.t,
                  lookupConstant: string * ConstType.t -> CoreML.Const.t}
  -> Env.t * Decs.t vector

Note that elaborateMLB returns the environment it uses for elaboration;
this is necessary because elaborate wants to create an empty environment
and take a snapshot in order to elaborate .mlb files:

      val E = Env.empty ()
      val emptySnapshot = Env.snapshot E

> > 1) With symbolic links, one can have multiple paths to the same file.
> > Should an .mlb file that is included through different paths be treated as
> > the same .mlb (i.e., elaborated exactly once)?
>
> Yes.  It seems easy enough to check the file identity (inode number).

Fair enough.


> To answer the other questions, let's consider in general the ways in
> which the basis library is currently special (i.e. different from user
> code).
>
>   1. MLton knows (and has hardwired in various places) that the
>      program is split into two pieces, the first piece being the basis
>      library and the second piece being the user code.  This split
>      affects the behavior of dead code elimination, and the many
>      def-use info flags (-show-basis-used, -show-def-use,
>      -warn-unused).

I don't agree that dead code should be lumped in with the other def-use
info.  Dead code is a completely separate pass, uses an entirely different
notion of "use", and can very easily be extended to handle arbitrary
interleaving of lib and user code.  Currently, the pass looks like:

      val deadCode:
         {basis: CoreML.Dec.t list,
          user: CoreML.Dec.t list} -> CoreML.Dec.t list (* basis *)

And I think it could trivially be made to work like:

      val deadCode:
         {prog: (CoreML.Dec.t list * bool) vector} ->
         {prog: CoreML.Dec.t list}

Note that the type for elaborateMLB returns Decs.t vector; in
particular, it is a Decs.t per .sml file, so we can support a much finer
granularity.

Also, dead code is an optimization; the def-use info flags are
informative.

>   2. The basis library can use various language extensions not
>      available to user programs (rebinding of equals, _const
>      expressions)

Agreed.

>   3. Elaboration of the basis implicitly creates a primitive
>      environment with basic types (bool, int, ...).

Technically, the primitive environment is assumed by the basis and
provided by the compiler.  It's also assumed by later passes of the
compiler.  So, I think it is a separate issue than the Basis Library
itself; although, the Basis Library implementation needs a means to access
it.

>   4. The basis library sets some essential hooks that the compiler
>      internals depend on.  For example, it calls Exn.setInitExtra,
>      which is in turn used by the ImplementException pass.  Also, the
>      use of _basisDone MLtonFFI records the structure needed by the
>      elaborator to implement _import and _export declarations.

In the not-so-long run, I think it would make sense to factor as much of
this as possible out of the Basis Library into it's own Critical Library.
Like the primitive environment, we might implicitly include this (with no
exposed environment) in every program.  But, in particular, it would a
minimal implementation of these critical features.  For example, we might
have:

let
  val halt = fn s => fn _ => Primitive.halt s
in
  val _ = Primitive.TopLevel.setHandler (halt Primitive.Status.failure)
  val _ = Primitive.TopLevel.setSuffix (halt Primitive.Status.success)
end

which would be overriden by the Basis Library.  I don't know about the
Exn.setInitExtra or _basisDone, though.

> I think the key is that we need to separate the splitting of the
> program into two pieces from the other facets and provide a way for
> the user to specify the two pieces.  I propose that instead of viewing
> every program as one mlb, we view it as two: b.mlb u.mlb.  We use this
> split to treat b.mlb like we currently treat the basis and treat u.mlb
> like we currently treat the user program.

For some reason, I just don't like it.  It seems like it forces the user
to make an arbitrary division point; worse, it's a division point that
impacts the optimizations and what information is readily available.

> We can use the notion of split to solve
>
> > 2) Dead code pass.

See above; I don't think that split is the right notion here.

> > 5) -show-basis
> For 5, the split causes us to only display the basis produced by the
> u.mlb, and I think corresponds to what happens now as well as the
> encoding you gave.

I really don't think that split is the right notion here.  The basis that
results from a given .mlb file is a very well defined notion.
Furthermore, I think that -show-basis will be very useful for "debugging"
mlb files.  For example, if I have

a.mlb:
  z0.mlb
  a.sml

then I'm bothered by the fact that
  mlton -load-basis z0.basis -show-basis a.show-basis a.mlb
will implicitly hide the bindings from z0.mlb, but clearly if I include
a.mlb in some other project, then I'll get all of the z0.mlb bindings.

> > 6) -show-basis-used
> > 7) -warn-unused
> > 8) -show-def-use

> For 6, 7, 8, the call to Env.clearDefUses occurs
> after elaborating b.mlb.  This will cause the def-use information to
> be for u.mlb.

I don't feel like I can get all the information that I really want.  For
example, let's suppose that I have  basis-2002.mlb, util.mlb, and
proj.mlb.  util.mlb depends on basis-2002.mlb (but only exports Util
structures) and proj.mlb depends on both basis-2002.mlb and util.mlb.  So,
this will necessarily impose a linear order:
  basis-2002.mlb --> util.mlb --> proj.mlb.
Now, a perfectly reasonable thing to ask is how does proj.mlb depend upon
util.mlb _excluding_ it's dependence on basis-2002.mlb.


> To make splitting available for more than just the basis library, we
> need to add the flags we discussed before: -{load,save}-basis.

What precisely is a .basis file?  A saved world?  Those will be really
big.  Otherwise I guess it would be some encoding of the .mlb file.  Then
the question is what to do about moving .basis files around in the path.

> The rest can be handled by annotations.  I don't know the right
> syntax.  Maybe something like
>
> <bdec> ::= ! <ann>* (<bdec>)

I was thinking:

<bdec> ::= !(<annlistP>) <abdec>
         | <abdec>

<abdec> ::= local <bdec> in <bdec> end
          | basis <bid> = bas <bexp> end
          | ...

<annlistP> ::= ann
             | ann COMMA annlist

<annlist> ::=
            | ann COMMA annlist

<ann> ::= ...

> > Stephen's mantra of being able to do everything without extra/proxy
> > files.  I don't know that being able to annotate arbitrary basdecs
> > is necessarily better.
>
> Yeah, it still seems nice to me, as long as it doesn't cause problems,
> to have the annotations apply to <bdec> rather than <foo>.mlb.

I think that would be fine.  I would just implement the annotations as a
bunch of fluidLet's as they were encountered.  The tricky bit is that
.mlb's should take the join of all their annotations (implicit or
explicit), but that will be handled at parsing time.

> Prefixing "local _prim in end"
> to all programs seems like the right fix to ensure that the primitive
> decs are always there.

Agreed.

> BTW, if you found any errors in the static semantics that I sent,
> please send a corrected version.  Hopefully it will make it into
> documentation someday.

I don't think I saw anything wrong.  In any case, I've TeX-ed up the
syntax & semantics of MLBs in the style of the Definition.  I'm planning
on including it as an appendix to the user's manual, with a less technical
presentation of how to use mlbs in the manual proper.