[MLton] mlb files and the ML Kit

Sat, 13 Mar 2004 01:11:25 +0100 (CET)

>> I don't see the practical need for basis-identifiers over using the
>> file-system to name bases.  (That is, I'm in favor of the former
>> proposal.)  What is the motivating example for basis-identifiers?

My motivation is that an mlb-language with support for basis bindings
lets me (as a programmer) specify exactly what a program unit should
depend on - to any degree I want! Moreover, say that I have written
the following mlb-file - more about the slightly modified syntax
below:

   my.mlb
     open A.sml B.sml C.sml

Also, say that the program units B.sml and C.sml depends on A.sml, and
that C.sml does not depend on B.sml. Then, a tool could analyse my.mlb
(including A.sml, B.sml, and C.sml) and generate the following
mlb-file:

   my2.mlb
     bas A = A.sml
     bas B = let open A in B.sml end
     bas C = let open A in C.sml end
     open A B C

I could also write the my2.mlb-file myself, if I want to be explicit
about dependencies.  Without basis bindings, new mlb-files are needed
to do such things.

The tool could be useful, for example, to extract parts of a program
for use in another software project. More importantly ;) the language
allows me to limit the number of bases the ML Kit needs to deserialize
when forming the elaboration and compilation bases for elaborating and
compiling a program unit. We could also add macros (with call-by-name
semantics), although I think that would be overkill:

   my3.mlb
     bas A = A.sml
     bas F(X) = let open A in X end
     bas B = F(B.sml)
     bas C = F(C.sml)
     open A B C

The syntax I'm using right now in the implementation is the following:

   <bexp> ::= <bexp> <bexp>
            | let <bdec> in <bexp> end
            | file.sml
            | file.sig
            | ( <bexp> )
            | ()
            | bid

   <bdec> ::= <bdec> <bdec>
            | <empty>
            | local <bdec> in <bdec> end
            | bas bid = <bexp>
            | open <bexp>
            | open file.mlb

I'm of course willing to modify the syntax in any way to end up having
only one mlb-syntax, but I found that a let-construct was
appropriate. It should be straightforward to throw in functor
renaming, clean, etc. The static semantics for the language is much
like the one suggested by Stephen earlier. Preliminary program code
for parsing and dependency inference for the above language is
attached below.

Some derived forms could be useful:

  bexp1 | bexp2   =>   let bas bid1 = bexp1
                           bas bid2 = bexp2
                       in bid1 bid2
                       end

With this, we could write my.mlb as

   my4.mlb:
     A.sml (B.sml | C.sml)

> One argument that I see is by analogy to CM.  One of the features that
> I dislike about CM is that I can get extra expressiveness by dropping
> down into a new file, e.g. using and administrative group to add an
> import filter.  I like the fact that mlb files don't require this.
>
> The extra expressiveness that one gets in the case of mlb files is the
> ability to start evaluation in a clean basis and the ability to name a
> basis.  So, one could write
>
> basis Basis1997 = bas ... end
> basis Basis2002 = bas ... end
> local
>   open Basis1997
> in
>   ...
> end
> local
>   open Basis2002
> in
>   ...
> end
>
> Of course it is not a question of expressiveness, just of convenience,
> since one could put the definitions of Basis1997 and Basis2002 in
> separate files and refer to the files instead of opening the bids.
>
> As a more specific example, suppose one wanted to look at a single mlb
> file describing MLton or some other large program that was composed of
> a bunch of mlb files.  It would be nice to have a simple tool that
> took an mlb and followed all the indirections, producing a single mlb
> with no references to other mlbs (maybe this is what Martin was
> talking about).
>
> I could imagine that being useful, and it's something you can't do
> without the more expressive language.

Although this wasn't what (I thought) I was talking about, it seems
like a nice property that we should strive for. BTW: should it be
allowed to specify a source file twice in an mlb-file? Or in an entire
project? Also, would it be bad to require mlb-file names to be unique
- or should we require only absolute paths to mlb-files to be
unique. To obtain a simple mechanism for generating unique machine
code labels (and new type names, for that matter), it would be great
if the concatenation of an mlb-file name and a source file name
uniquely determines the source file.

<#part type="text/plain" filename="~/MLB_PROJECT.sml" disposition=attachment description=MLB_PROJECT.sml>
<#/part>

<#part type="text/plain" filename="~/MlbProject.sml" disposition=attachment description=MlbProject.sml>
<#/part>

Cheers,

Martin