[MLton] caching elaboration

Stephen Weeks MLton@mlton.org
Tue, 17 Aug 2004 16:33:00 -0700

> The order of error reporting should be decoupled from the order of
> error detection.
> I think, it be nice to go to a buffered error message
> architecture. i.e. all the error messages for an invocation are
> collected into a data structure. There should be enough
> meta-information so you can post-process the error buffer into what
> every sensible reporting order makes sense.

This sounds nice in principle.  But in practice, for type checking SML
anyway, there are problems.

> Well, the right thing to do is to track which errors are "atomic"
> and which are the result of a "correction"

Unfortunately, the boundary between these two notions is not so clear,
especially with type inference as non-local as is SMLs.  Unification
in one place may affect errors in another, and it is impossible to say
which is the atomic cause and which is the correction.

> with this information you can decide to filter the error messages to
> include only the atomic errors and maybe the first N or so supect
> errors or always report the atomic errors first.

The dependencies are much more intertwined than one would like, so
that in truth, a single error is likely to affect many other errors,
and one must make many heuristic choices about when to "cut"
dependencies even though it may cause spurious errors.

> Norman Ramsey has a paper on error combinators that let you
> propagate this derived vs atomic error information relatively
> easily.

Right, this is a nice mechanism.  MLton's elaborator does this kind of
error propagation, albeit using only options, on an ad-hoc basis right
now.  Propagating the errors within the compiler in a type-safe way is
not the hard part.  The hard part is deciding what should depend on
what and when to do error recovery.  I guess I'm saying that Henry is
right -- you do have to do error correction, even if you don't want
to.  Unless you are willing to miss a lot of errors.

> I think the issue of error recovery is really unrelated to the
> detection vs reporting gripe I have.

It doesn't do any good to separate the error detection from reporting
unless unless it gives you some interesting axes to make choices on
how to order error reports.  I don't believe that the "atomic" vs
"result of correction" distinction is useful in SML, and I don't know
of any others for type checking.  I do agree with Matthew that the two
orders that we have for parse and type errors are useful in different

* reporting all parse errors and then all type errors.

* reporting all parse and type errors in one file before any errors in
  the next.

In any case, as a simple example of one of the tens (if not hundreds)
of heuristic choices that I had to make when writing MLton's
elaborator, consider the following two incorrect functors.

functor F (S: SIG) =
      val _ = S.x
functor G (S: sig
		 include SIG
	      end) =
      val _ = S.x

The question is, should an undefined variable error be reported for
S.x?  It turns out that MLton will not report an error for F but will
report an error for G.  This decision was not based on any principle,
but was based on trying lots of code and making guesses as to the
kinds of error messages that will help a user debug their program most
quickly.  In both cases, under a strict notion of dependence or a
strict application of the combinators in Norman's paper, the interface
corresponding to S would not be defined (as it depends on the
elaboration of the undefined signature SIG), and hence the S.x error
would be the "result of a correction".

The point is that MLton did have to do error recovery on G (in
treating SIG as the empty signature) in order to process G's body.
Without some amount of error recovery, there will be too few errors
reported per run.

I won't claim there are deep principles that went in to these
decisions.  I found it hard enough to get the correctness and
efficiency right :-).