[MLton] Showing types of variables

Thu Nov 1 12:27:13 PST 2007

On Sat, 13 Oct 2007, Vesa Karvonen wrote:
> Matthew Fluet mentioned after the ML workshop that one could perhaps
> output the types of variables with the def-use info, because the types
> have been computed at that point.  The def-use mode could then be
> extended to parse the types and to show the type (as a message) when
> the cursor is at a variable.  Below is a exploratory patch to the
> def-use mode and MLton that does that.  It works (try it!),

I've been using the patch since it was posted, and it can be helpful.  I 
find the def-use mode most helpful for exploring unfamiliar code, and 
getting the elaborated types of variables is great.

> but there are a number of issues:

My gut feeling is that there isn't much that can be done to improve the 
situation given the current setup of the elaborator and the maintenance of 
environments.

> - I couldn't figure out a way to just find the types of variables, so
> I modified the code to actually save the type in the newUses function
> as a part of the defUses field.  I suspect there might be a better way
> to do this, but I don't know it.

I can't see an easy way of finding it either.  It looks like the type 
scheme of variables should be in the current field of the NameSpace.t 
type (and possibly accessible via the lookup field of the NameSpace.t 
type), but there are various imperative manipulations of the current 
environment that may mean that that information isn't all there at the 
time we want to emit def-use information.  Your method seems fine.

> - Saving the types (and formatting them (layout)) seems to take
> considerable space (and time).  There are probably several ways to
> improve performance (e.g. only print each type once as a separate
> table and just print an index into that table with each variable), but
> I'm not sure what would be the best approach (e.g. is there a hash
> function for Type.t / Scheme.t ?).

I didn't notice the space/time overhead.  (But, then, I'm fairly 
forgiving.)

> - The type names aren't perhaps the best possible names one could get.
> They have an extra ?. prefix and sometimes a _N suffix.  I don't know
> how to get better type names.  I tried inserting a call to
> setTyconNames at the beginning of processDefUse.  I think that just
> added the ?. prefixes.

It would be nice to improve the situation here.  The "_N" suffix comes 
from types that declared in the body of a functor.  The "?." prefix 
usually corresponds to relativization to the top-level environment.

Ideally, the type displayed at the definition of a variable would be the 
string representation of the type that would have been used were a type 
error reported at the point of the definition.  For example, something 
like:

structure S =
struct
   datatype t = T of int
   fun new i = T i
   (* val _ : unit = new *)
end

The type for "new" is displayed as "?.int32 -> S.t", though if you 
uncomment the type ascription, you get the error message:
   Pattern and expression disagree.
     pattern:    [unit]
     expression: [int -> t]
     in: (_): unit = new
with the nicer type "int -> t".

Of course, I have no idea how to achieve this.  The pretty printing of 
type schemes is always relative to the "current" environment, which works 
well for error messages (since you care about the environment at the point 
where the error is being reported), but not so good for retroactively 
displaying the types after elaborating the whole program.

On the other hand, the displayed types seem no worse than what we get from 
the -show-basis <file> mechanism.  For the program above, I get:
   structure S:
      sig
         datatype t = T of ?.int32
         val new: ?.int32 -> S.t
      end

Though, the one thing I find very curious is that sometimes at an 
application, the type for the function and the type for the argument are 
displayed quite differently.  For example, in 
<src>/mlton/main/compile.fun:535, we have:
           thunk = fn () => Monomorphise.monomorphise xml,
The displayed type for "Monomorphise.monomorphise" is "?.XmlTree.Program.t 
-> ?.XmlTree.Program.t", while the displayed type for "xml" is 
"Program.t_11".  It seems that the latter is due to the fact that the 
type of "xml" is relative to the instantiation of the Monomorphise 
functor, while "Monomorphise.monomorphise" is relative to the formal 
argument of the Monomorphise functor.  If I change the line to:
           thunk = fn () => (Monomorphise.monomorphise : unit) (xml : unit),
then the error messages are:
   Expression and constraint disagree.
     expects: [unit]
     but got: [Sxml.Program.t -> Sxml.Program.t]
     in: (Monomorphise.monomorphise): unit
   Expression and constraint disagree.
     expects: [unit]
     but got: [Sxml.Program.t]
     in: (xml): unit
which are somewhat nicer names.

Actually, now that I think about it, we could get the nicer names if we 
did the "Scheme.layoutPretty" at the point where we invoke "newUses", 
rather than doing it after the end of elaboration.  But, this will almost 
certainly eat up a lot of space, because the layout form of a type is much 
larger than its internal representation.  (In particular, we'll lose a lot 
of sharing going to a Layout form.)  Though, it might be worth trying at 
some point.

Alternatively, if one were to completely rework the elaborator, we could 
use a persistent data structure for the environment.  Then at a variable 
def, we could record both the variable scheme and the environment at the 
def, and then have a scheme pretty printer that would take an environment 
with which to print the type.  But, maintaining all the intermediate 
environments in a persistent data structure would likely also be very 
costly in terms of space space.

Anyways, if I'm going to dream, it would be really cool to have the type 
displayed at the use of a variable to correspond to the (instantiated) 
type at that use site.  For example, while "List.length" (at its 
definition) has the type "'a list -> int", in the expression
"List.length ["a","b","c"]", it has the type "string list -> int".  But, I 
think that is a lot harder to achieve with the current setup, since we add 
a use when we lookup the variable in the environment, whereas for the 
above, we would need to add a use with the instantiated scheme of the used 
variable.

In any case, given that the def-type and the use-type of variables can 
differ (not only that the use-type is an instantiation, but also that the 
types in scope at the two points don't always agree), I wonder if it 
wouldn't be better to only show the type of a variable at its def-site.
I still think that would be useful, since one often is interested in the 
type of some unannoted local variable, and the def point probably isn't 
far from the use point.

One final, very minor, comment about the def-use.el mode.  I note that 
when hovering over a variable, with the type displayed, the displayed type 
will overwrite any other Emacs messages being sent to the mini-buffer. 
For example, when I tried opening a large file while I was hovering over a 
variable, I didn't realize that Emacs had raised a "File is large, really 
open? (y or n)" prompt.  It probably makes more sense for the type to be 
displayed once, and then relinquish the mini-buffer.