[MLton] Showing types of variables
Matthew Fluet
fluet at tti-c.org
Thu Nov 1 12:27:13 PST 2007
On Sat, 13 Oct 2007, Vesa Karvonen wrote:
> Matthew Fluet mentioned after the ML workshop that one could perhaps
> output the types of variables with the def-use info, because the types
> have been computed at that point. The def-use mode could then be
> extended to parse the types and to show the type (as a message) when
> the cursor is at a variable. Below is a exploratory patch to the
> def-use mode and MLton that does that. It works (try it!),
I've been using the patch since it was posted, and it can be helpful. I
find the def-use mode most helpful for exploring unfamiliar code, and
getting the elaborated types of variables is great.
> but there are a number of issues:
My gut feeling is that there isn't much that can be done to improve the
situation given the current setup of the elaborator and the maintenance of
environments.
> - I couldn't figure out a way to just find the types of variables, so
> I modified the code to actually save the type in the newUses function
> as a part of the defUses field. I suspect there might be a better way
> to do this, but I don't know it.
I can't see an easy way of finding it either. It looks like the type
scheme of variables should be in the current field of the NameSpace.t
type (and possibly accessible via the lookup field of the NameSpace.t
type), but there are various imperative manipulations of the current
environment that may mean that that information isn't all there at the
time we want to emit def-use information. Your method seems fine.
> - Saving the types (and formatting them (layout)) seems to take
> considerable space (and time). There are probably several ways to
> improve performance (e.g. only print each type once as a separate
> table and just print an index into that table with each variable), but
> I'm not sure what would be the best approach (e.g. is there a hash
> function for Type.t / Scheme.t ?).
I didn't notice the space/time overhead. (But, then, I'm fairly
forgiving.)
> - The type names aren't perhaps the best possible names one could get.
> They have an extra ?. prefix and sometimes a _N suffix. I don't know
> how to get better type names. I tried inserting a call to
> setTyconNames at the beginning of processDefUse. I think that just
> added the ?. prefixes.
It would be nice to improve the situation here. The "_N" suffix comes
from types that declared in the body of a functor. The "?." prefix
usually corresponds to relativization to the top-level environment.
Ideally, the type displayed at the definition of a variable would be the
string representation of the type that would have been used were a type
error reported at the point of the definition. For example, something
like:
structure S =
struct
datatype t = T of int
fun new i = T i
(* val _ : unit = new *)
end
The type for "new" is displayed as "?.int32 -> S.t", though if you
uncomment the type ascription, you get the error message:
Pattern and expression disagree.
pattern: [unit]
expression: [int -> t]
in: (_): unit = new
with the nicer type "int -> t".
Of course, I have no idea how to achieve this. The pretty printing of
type schemes is always relative to the "current" environment, which works
well for error messages (since you care about the environment at the point
where the error is being reported), but not so good for retroactively
displaying the types after elaborating the whole program.
On the other hand, the displayed types seem no worse than what we get from
the -show-basis <file> mechanism. For the program above, I get:
structure S:
sig
datatype t = T of ?.int32
val new: ?.int32 -> S.t
end
Though, the one thing I find very curious is that sometimes at an
application, the type for the function and the type for the argument are
displayed quite differently. For example, in
<src>/mlton/main/compile.fun:535, we have:
thunk = fn () => Monomorphise.monomorphise xml,
The displayed type for "Monomorphise.monomorphise" is "?.XmlTree.Program.t
-> ?.XmlTree.Program.t", while the displayed type for "xml" is
"Program.t_11". It seems that the latter is due to the fact that the
type of "xml" is relative to the instantiation of the Monomorphise
functor, while "Monomorphise.monomorphise" is relative to the formal
argument of the Monomorphise functor. If I change the line to:
thunk = fn () => (Monomorphise.monomorphise : unit) (xml : unit),
then the error messages are:
Expression and constraint disagree.
expects: [unit]
but got: [Sxml.Program.t -> Sxml.Program.t]
in: (Monomorphise.monomorphise): unit
Expression and constraint disagree.
expects: [unit]
but got: [Sxml.Program.t]
in: (xml): unit
which are somewhat nicer names.
Actually, now that I think about it, we could get the nicer names if we
did the "Scheme.layoutPretty" at the point where we invoke "newUses",
rather than doing it after the end of elaboration. But, this will almost
certainly eat up a lot of space, because the layout form of a type is much
larger than its internal representation. (In particular, we'll lose a lot
of sharing going to a Layout form.) Though, it might be worth trying at
some point.
Alternatively, if one were to completely rework the elaborator, we could
use a persistent data structure for the environment. Then at a variable
def, we could record both the variable scheme and the environment at the
def, and then have a scheme pretty printer that would take an environment
with which to print the type. But, maintaining all the intermediate
environments in a persistent data structure would likely also be very
costly in terms of space space.
Anyways, if I'm going to dream, it would be really cool to have the type
displayed at the use of a variable to correspond to the (instantiated)
type at that use site. For example, while "List.length" (at its
definition) has the type "'a list -> int", in the expression
"List.length ["a","b","c"]", it has the type "string list -> int". But, I
think that is a lot harder to achieve with the current setup, since we add
a use when we lookup the variable in the environment, whereas for the
above, we would need to add a use with the instantiated scheme of the used
variable.
In any case, given that the def-type and the use-type of variables can
differ (not only that the use-type is an instantiation, but also that the
types in scope at the two points don't always agree), I wonder if it
wouldn't be better to only show the type of a variable at its def-site.
I still think that would be useful, since one often is interested in the
type of some unannoted local variable, and the def point probably isn't
far from the use point.
One final, very minor, comment about the def-use.el mode. I note that
when hovering over a variable, with the type displayed, the displayed type
will overwrite any other Emacs messages being sent to the mini-buffer.
For example, when I tried opening a large file while I was hovering over a
variable, I didn't realize that Emacs had raised a "File is large, really
open? (y or n)" prompt. It probably makes more sense for the type to be
displayed once, and then relinquish the mini-buffer.
More information about the MLton
mailing list