[MLton] implement _address and _symbol

Matthew Fluet fluet@cs.cornell.edu
Mon, 18 Jul 2005 10:12:03 -0400 (EDT)


> It seems there was consensus on these points:
>   FFI that uses MLton.Pointer.t should be pointer-type transparent

Yes.

> Ok. However, there seems to be a contradiction here wrt _import *.
> 
> '_import *: int -> int;' right now gives MLton.Pointer.t -> int -> int.

The above is not allowed by the current implementation.  You must make the
pointer type explicit in the annotation.

> '_import *: MLton.Pointer.t -> int -> int;' ?

This is the correct annotation for the current implementation.  And there 
is no need to change it.

> That would break compatibility.

There is no compatibility issue, since we currently implement the desired 
annotation.

> Ditto for _symbol *. It seems the right types are:
> 
> _symbol "x": int;	==>                    (unit -> int) * (int -> unit)
> _symbol *: int;	==> MLton.Pointer.t -> (unit -> int) * (int -> unit)
> 
> However, where does the pointer get specified?

We seem to have settled on

_symbol *: ptrTy, cbTy;  ==> (ptrTy -> cbTy) * (ptrTy * cbTy -> unit)

> In fact, all of the ': ....;' syntax seems bogus to me.
> Where's the point in specifying all of this? 

You are correct that it is not a proper ML type annotation, in the sense 
that it specifies the type of the resulting expression.  Rather it is a 
type annotation that conveys just enough to nail down the type of the 
expression.  As I said before, the FFI primitives are not polymorphic 
primitives, they are a family of primitives.  The annotation selects which 
member of the family.

This isn't a real suggestion, but one could imagine the following syntax:

  _symbol[cbTy] "symbol";
  _symbol[ptrTy,cbTy] *;

which makes it a little more clear that the type annotation is selecting a 
particular primitive, which contributes to the type of the resulting 
expression, but does not equal it.  Likewise:

  _address[ptrTy] "symbol";
  _import[cfTy] "symbol";  or  _import[argTy,resTy] "symbol";
  _import[ptrTy,cfTy] *;  or  _import[ptrTy,argTy,resTy] *;
  _export[cfTy] "symbol";  or  _export[ptrTy,argTy] "symbol";

You can see my bias peeking through: knowing the implementation, I know 
that that more explicit "or" alternatives are easier to implement.

Recalling that originally the only FFI primitive was _import of
C-functions, it becomes clear why adopting the ML style type annotation
made sense -- since in that (one) case, the annotation is the type of the
resulting expression.

> Another frightening aspect no one has brought up: what about pointers?
> val set : int vector -> unit = _store "x"
> 
> This is extremely frightening (to me) since it seems the exported pointer 
> can never be assumed to contain valid information. For _import this works,
> because you don't use the GC during the C function call.

That's not actually true.  You can call a C function, which calls an 
_export-ed ML function, during whose execution a GC may occur, so any ML 
pointers that the C function had when control returns are not necessarily 
valid.  It is a (minor, as in relatively easily fixed) deficiency of the 
runtime system that there is no way to register ML pointers with the 
runtime to be treated as roots and updated at a GC.

> And what about 
> val get : unit -> int vector = _fetch "x"
> Where does the length information come from?

The supposition is that the pointer in the symbol "x" is a (pointer to a) 
ML vector.  As above, with GC's occurring, it may be difficult to ensure 
that the pointer is valid.

> I just compiled foo.sml:
> val ex = _export "test": int vector -> unit;
> fun out x = print (Int.toString x ^ "\n")
> fun app x = Vector.app out x
> val () = ex app
> ... this actually works, yikes.
> 
> I can only assume that the programmer is required to only pass back SML
> arrays to SML functions; never arrays coming from C. After the C call
> which set the symbol, on return to SML the GC might run. Thus, _fetch
> doesn't make sense either.
> 
> So, _fetch/_store of heap types should fail to compile, right?

Not necessarily, but possibly.

Bear in mind, this is an interface to *C*!  The programmer is leaving a 
type safe language, and so they had better know what is going on.

> (* These generate deprecated warnings (with suggested change): *)
> val somefnptr  : MLton.Pointer.t -> int -> int = _import *: int -> int;
> val somevalptr : MLton.Pointer.t -> int = _import *: int;

Neither somefnptr nor somevalptr are currently accepted by the compiler.

> Comments?

I still prefer _symbol over _fetch/_store.
I don't mind that a 'define'-ed _symbol is not initialized; this is *C* 
  and that behavior is allowed.  Furthermore, you might be defining a 
  symbol so that the C code can set it, and there is no need to initialize 
  it.
I don't think that type-inference is necessary; I think the current 
  annotations are fine.  Also, whatever decision is reached wrt 
  type-inference, it would certainly make more sense to first implement 
  the new FFI with annotation before tackling inference as well.