[MLton] HOL's Moscow ML implementation, and pushing MLton to emulate it

Mon, 11 Apr 2005 16:23:05 -0700

> If we actually, only duplicate based on the underlying
> representation type and not source level type. i.e. All, objects
> with a boxed representation share the same code. Even if they are
> used at different types.

It's not that I was worried about multiple copies of a piece of code
-- I'm worried about a single copy of the entire program, at least of
any part that deals with the representations that you want to
abstract, especially monomorphic code.  That seems unavoidable in
order to get the representation abstraction.

> > One thing I would like to see in a solution is that if I define a list
> > and a sum function in the host world
> > 
> >   val xs: int list
> >   val sum: int list -> int
> > 
> > and export both of these to the hosted world, and then in the hosted
> > world call "sum xs", it should effectively call "sum xs" in the host
> > world. 
...
> Here,I'm a little lost. It seems like once you pass "sum" xs to the 
> interpter it escapes. So that there's less you can realy do wrt to 
> optimizations.

True, but it is still possible to arrange when you type "sum xs" into
the interpreter, it figures out that it has an "int list -> int" and
an "int list" in the host representation, and makes the host call,
with no more interpretive overhead.  My approach indeed does that, and
it seems like a big win.

> > The only real communication between the worlds is the delaying trick
> > that allows constant-time coercion of host values to hosted values.
> > Unfortunately, there is *no* communication the other direction.  The
> > delaying trick doesn't work, because there are no hooks in the host
> > types to introduce delaying, and I want to avoid changing host
> > representations because that would adversely affect host performance.
> 
> I could do without the delaying trick if I had existential types in the 
> source language. :)

It's all pointers underneath :-).  In any case, I'm not sure if you're
referring to using the delaying trick for host->hosted communication
or hosted->host communication.  For host->hosted, sure, although the
delaying trick doesn't really bother me.  Years of living without
existentials has surely warped my brain.  As to using existentials for
hosted->host, you could do it, but you'd have to change host
representation, which if you think about how the existentials would be
implemented, you'll see entails a performance hit on host code.

> I had thought about ref cells a bit too. The right way to deal with
> this is to abstract the interface of a ref cell as setters and
> getters.

I thought about this too, and I disagree.  At least not given my
constraint that the host representations must remain the same.
Abstracting to setters and getters will introduce a lot of extra
overhead into the host code, since it now will have to call a function
and possibly do some sort of dispatch/unwrapping when it normally
could just do a Ref.!.

> Someone should get ambitious and write this up for the upcomming ML
> workshop.

I am on the committee, so I am sure I can arrange for its acceptance
:-).  Perhaps somewhere else would be better.

> So here is a summary of the two approaches. ..
> 
> The basic gist of my idea is to recompile all the code that deal with both 
> universal and native values in as abstract way as possible. So it doesn't 
> care how things are represented.

Agreed, under the proviso that "recompile all" means, in effect,
"duplicate all".

> What you are suggesting is to recompile all the code so it all deals
> with one concrete representation.

No, for two reasons.

1. Look at my specializedLength function, which under monomorphisation
MLton will implement with three copies of List.length: one for int
list, one for real list, and one for universal lists.  These three
copies run with no interpretation overhead, and deal with different
representations.  There is no universal representation of lists in the
host world.  It is only when one goes to the hosted world that there
is one concrete representation, and that is only in the trivial sense
of the outermost variant tag injecting the lists into the same sum
type -- the inside lists have different representations.  I guess one
could say that this approach is Shao's flexible representations to the
extreme, with the "boxing" only at the outermost part of the list,
outside even the individual list elements.

2. All the code is not duplicated.  Only polymorphic code.  And then
only for as much as you want specialization (and speed) in the
interpreter.

> Both our approaches lead to different kinds of performance problems. My 
> approach allows you to use any representation you like, but you take a 
> performance hit in that code that deals with both hosted or host values has 
> to access that code through an abstract interface.

I'm not sure what you mean by "deals with both hosted or host values".
Under my approach, interpreted code takes a performance hit, and host
code does not.  Period.  And interpreted code takes the hit only until
the interpreter "bottoms out" to host values, which run at full speed.
If one uses specializedLength for embedding List.length, then one will
will not pay any per-list-element abstraction hit when computing the
length of an int list or real list.

> Your approach dictates a universal rep so potentially all code pays
> a performance hit.

No.  Hosted code does not use a universal rep and pays no performance
hit (at least due to representation stuff; it may due to flow
analysis, but let's continue to ignore that for now).  There is no
change to host representations or code. 

> On the face of it. It's not clear what is better.

As I see it, my approach is a strict improvement on yours, at least in
the context of MLton's compilation strategy.

* It supports references and arrays.
* It supports communication of values both directions in constant
  time. 
* It duplicates less code.
* It runs host code with no slowdown.
* It allows the interpreter to bottom out to host code, which then
  runs at full speed.

> I think, the real deciding factor is how values move across the
> host/hosted worlds. If they really aggressively share values then one
> big concrete type is probably the best way to go.. if there's little
> interaction paying the abstraction hit might be better for overall
> performance.

I don't understand in what situation your approach will provide better
performance.  I think we're not quite on the same page yet.