[MLton] cvs commit: Improved FFI.

Wesley W. Terpstra wesley@terpstra.ca
Tue, 26 Jul 2005 11:16:12 +0200


On Sun, Jul 24, 2005 at 11:42:28AM -0400, Matthew Fluet wrote:
> > There is now no way to take the address of a function.
> This isn't entirely true; as you note, you can fib in one form or another.
Great. :-P

> I don't seriously object to  _address "symbol": ptrTy, cTy;
> 
> where cTy includes both C base types and C function types (though, even
> then, you have additional complications, since a C function type should, I
> believe, carry it's calling convention -- if a function pointer doesn't
> need to be the same size as an int pointer, why would a cdecl function
> pointer need to be the same size as a stdcall function pointer?).

Shoot. :-/ 
That's true.

In fact, if I remember right, back in the days of x86 yore, there were
_short and _far keywords attached to functions. Those definitely changed
the size of a function pointer.

> But, the reality is that MLton expects all C pointers to be of the same
> size (and equal to 32bits).
> 
> And this representation works certainly for pointers to all C base types
> on all architectures where MLton runs and for pointers to C functions on
> architectures where people have been using the MLton FFI ...
>
> So, all that being said, adding a (simple) _address would be very easy 
> in the current setup, it would simply need to abide by the current 
> limitations.  

Then perhaps going back to _address "symbol": ptrTy; would be best.
So long as we store it in a 'intptr_t' (which despite some worries 
earlier on this list is *required* by X/Open), then all is well.
That is, intptr_t guarantees that void* to it and back is safe.

> "do whatever works on the platform(s) you are interested in (but don't be
> surprised if it doesn't work on other platforms)."

I completely agree; C is too low-level to do anything else.

> Don't get me wrong -- nobody objects to producing more standard conforming 
> C code, but I don't see the cost/benefit ratio being worth the effort.  

Here's a different proposal:

The bytecode has the possibility for much better portability.
(if the FFI is forbidden for new symbols/functions not in the basis)

Since the bytecode interpreter is *hand* written in C, it should be 
possible to write it in a standards conforming (no casting) way.

If the entire heap were simply an array of intptr_t, then the only
casts needed should be from intptr_t to inptr_t*, which is safe-ish.
For that matter, it would be especially great if the byte-code could
be made machine-independent ala Java .class. (hehe)

> The cost is certainly high, and the benefit seems simply to be the 
> (dubious) claim that such C code would be less susceptible to being 
> treated in a different way by future versions of gcc.

I don't find this claim is dubious; try using gcc 4 and turning on -O3.
My expectation is that MLton will be unable to correctly compile itself.

> Of course, the gazillion lines of existing C code is the best defence
> against future versions of gcc breaking backwards compatibility.

Those hand written C programs are getting broken by gcc all the time.
One of the tasks of debian maintainers is to eradicate these sorts of
aliasing and alignment bugs from packages and forward the fix upstream.

-- 
Wesley W. Terpstra