[MLton] mlbs and soft and hard links

Stephen Weeks MLton@mlton.org
Sun, 15 Aug 2004 10:51:29 -0700

> Using path names as the `unit of caching' is, I find, often a big
> problem.  

I don't see the problem in our case.

> It means, for instance, that if I move a `world' (directory and all
> that is under it) then things will get confused.

If by "confused" you mean wrong answer I disagree.  If you mean that
we will rebuild the cache, I agree, but it doesn't matter.  Moving the
directories around is not a common operation relative to running the
type checker, and rebuilding the cache is not that costly.

> Also it means that if there are multiple paths to something (hard
> links) then it won't be viewed as being the same thing when reached
> along different paths.

True, but who cares.  As I understand it, here are the two ways that
we are comparing for treating the file system.

1. a map from absolute path to file contents.

2. a pair of maps, one from absolute path to file id and another from
   file id to file contents.

Clearly (1) is a simpler and more platform-indendent model.  Further,
they are incompatible in power with respect to handling changes to
filesystem structure.  As you mention, moving files around will not
invalidate a cache under (2) while it would under (1).  On the other
hand, creating new files with the same contents (as, for example, a
code-producing tool like mlyacc would do) will not invalidate a cache
under (1) while it would under (2).  Also, (1) generalizes nicely to
better forms of caching based on the abstract syntax tree (so we are
invariant to formatting and commenting changes).