[MLton] flattening of arrays and vectors

Stephen Weeks MLton@mlton.org
Fri, 13 Aug 2004 12:15:36 -0700


> Essentially, it's a matter of exposing the MLB->Ast.Basdec.t and
> MLB->Elaborate.Env.Basis.t maps, and having additions to them be
> preserved between invocations of the compiler.  Saving the world and
> replacing the mlton.world is one way of accomplishing it.

Yeah, I was hoping save world was a 90% solution.

> Right now, those maps are from OS.FileSys.file_id to their
> respective ranges.  As we've already noted, file_id makes good sense
> for establishing file identify during a compile, but is bad for
> moving directories around and certainly aren't valid for install
> packages.  Also, one reason we went to distributing the Basis
> Library sources was to make it easier for users to fix bugs without
> waiting for a new release.

Absolutely.  This is not at all intended to replace distributing
sources, or even to include a preprocessed library in packages.  For a
first cut, it is simply a way for a user to save some time by caching
the elaboration of an mlb, which they can then use to quickly type
check code that uses that mlb.

> So, we either need to expose how to build a new world when a user's
> Basis Library changes, or explicitly check all the dependencies.

I was definitely thinking that the caching would be exposed via
explicit command-line switches for dumping/loading.  That seems to
finesse the issues you mention with files changing or moving, since
the user would be responsible for rebuilding the cache if files
change.

Maybe not great, but hopefully easy to implement, and makes fast the
simple, common, case of working on a project with an unchanging basis
library.  One problem is that users, especially beginners, won't see
the speed by default.

> There is the other aspect that the Elaborate.Env.t isn't very
> compositional.  In one sense the environment just keeps growing,
> although things keep falling in and out of scope.  There doesn't
> seem to be any way to remove/replace all the information that will
> be associated with a cached .mlb file when a user decides to install
> an updated .mlb entry.

I agree that Env.t isn't compositional, but I'm not sure that's a big
deal.  I wouldn't shoot for any kind of fine-grained
"re-preprocessing".  All-or-nothing is fine.  That is, use the cached
version as long as nothing as changed.  If anything has, then
re-elaborate everything.  It's the users job to determine a reasonable
split of their program so that re-preprocessing doesn't happen to
often.

Hmm, I wonder if determining the split could be automated (and not
exposed), with some kind of adaptive binary-search approach.  The idea
is to cache the "first half" of the program on the first compile.
Then, if subsequent changes and compiles often cause this to be
re-preprocessed, back off and only cache the first quarter (or
whatever).  On the other hand, if subsequent changes rarely cause the
cache to be re-preprocessed, then cache the first 3/4 (or whatever).
The hope would be if that if one is working a lot on a single file,
this approach would automatically, after a few runs of the type
checker, cache the type checking of everything up to that file, so
that type checking becomes instantaneous.