[MLton] cvs commit: world no longer contains a preprocessed basis library

Matthew Fluet fluet@cs.cornell.edu
Fri, 12 Dec 2003 16:51:37 -0500 (EST)


> > In order to recover the "tighter" behavior of the old cmcat, we would need
> > to generate the graph for the necessary imports, not all the exports.  I
> > don't see a very easy way of doing it.  It is easy enough to scan a graph
> > to determine what symbols are imported from each imported .cm file.
>
> Could we use the import information to do a DFS and mark only the
> needed files?  Then, we could take the (in order, but too large) file
> list produced by CM.Graph.graph and filter out unneeded files.

That won't necessarily be minimal:

sources.cm:  Group
               structure Main
             in
               AB/sources.cm
               main.sml (* uses structure A *)

AB/sources.cm:  Group
                  structure A
                  structure B
                in
                  XYZ/sources.cm
                  AB/a.sml (* uses structure X *)
                  AB/b.sml (* uses structure Y *)

XYZ/sources.cm:  Group
                   structure X
                   structure Y
                   structure Z
                 in
                   XYZ/x.sml
                   XYZ/y.sml
                   XYZ/z.sml

If I'm understanding your suggestion correctly, from the graph of
"sources.cm", we determine that only structure A is necessary from
"AB/sources.cm", so we filter "AB/b.sml" from the files derived from the
graph of "AB/sources.cm".  But, from the full graph of "AB/sources.cm",
we'll have dtermined that we need structure X and structure Y from
"XYZ/sources.cm".  So, while we will then filter "XYZ/z.sml" from the
files derived from the graph of "XYZ/sources.cm", we'll be needlessly
including "XYZ/y.sml".

Actually, as I think about it, once you commit yourself to building up an
explicit graph for one CM.Graph.graph, you might as well build all of
them, add the necessary edges corresponding to imports, and just filter
against the exports of the top-level source.  Also, you can't just
reproduce the portable graph, because you really need to interpret filters
and merges, and not treat them as nodes.  Also, note that this will only
ever be "as good" as the underlying portable graph.  Nothing in Matthias'
spec says that the environment that feeds into a compile node need be
filtered to the necessary components (although that would be nice to
support cutoff recompilation).  For example, if a.sml depends on {x,y}.sml
and b.sml depends on {y,z}.sml, it would be perfectly o.k. to merge
the compile nodes of {x,y,z}.sml into a single environment and have
the compile nodes of a.sml and b.sml depend on the merge node.  In which
case (because we will never peek inside a .sml file), even if we determine
to keep a.sml but not b.sml, then z.sml will be included because we keep
everything from the merge node.

> > Or, we just let MLton's removeUnused passes eliminate the dead code.  The
> > two downsides of that are that (1) we might end up with extra code, (2) as
> > we saw before, while CM.make "sources.cm" might succeed, the corresponding
> > cmcat "sources.cm" might produce a file list with errors (unbound
> > identifiers or type errors, corresponding to code that CM never compiled).
>
> If we can't come up with an easy solution to make the new cmcat as
> tight as the old, I don't think it's worth spending more time.  It
> would, however, be worth explaining the situation to Matthias and
> asking if he has an easy solution.  I'm happy enough with using the
> bigger file list and leaving it up to MLton, which seems better than
> asking people to use a really old SML/NJ.  We should try a self
> compile with the bigger file list and see if it hurts.  As to the
> potential errors, I think a note in the user guide is good enough.  It
> should be easy enough for people to comment out their broken code, as
> I did for MLton.

I'll start thinking about writing up something to that effect for the
user's guide.

> I think it's more valuable to spend our time thinking about the kind
> of dead-code analysis that we want for mlbs.

Will do.