[MLton] Two questions about FFI types
Matthew Fluet
fluet@cs.cornell.edu
Tue, 10 May 2005 10:50:05 -0400 (EDT)
> 1. Some of the mysql functions take a number of C strings (character
> pointers), that behave differently if they are null. Unfortunately there
> doesn't appear to be a way to give a type to these C functions for
> _import; string doesn't work because I can't pass null, and
> MLton.Pointer.t doesn't work because I can't pass string. The only two
> solutions I have to this are (a) to import the function at multiple types,
> and then call the one that matches my dynamic set of nulls--there are then
> 2^n imports in general! or (b) write a C stub that takes, for each
> argument, a char* and a bool to indicate if it is supposed to "be null".
> Neither of these is very nice to me. I'd be the first to argue that null
> is an abomination, but it is very common in C libraries that mlton
> programs would want to interface with, so are there any prospects of being
> able to do this a cleaner way? (Or is there already a cleaner way?)
I don't know of any cleaner way given the way things are currently
implemented. I ran into a very small version of this issue with some of
the networking functionality of the Basis Library:
http://mlton.org/pipermail/mlton/2002-December/022923.html
Our conclusion there was to have two different C functions. But, that
doesn't really scale to your situation.
Be careful importing the same C function at two different ML types. The C
codegen emits a C-prototype for each imported function, so you need to
ensure that the ML type maps to the same C type. This isn't a problem in
this situation as the C type of a MLton.Pointer.t and the C type of an
array are both (char*).
To be honest, I don't think that importing the function at multiple types
and exporting it with a type of string options is that bad an option.
> 2. Since this interfaces with a database (and in fact my task will be very
> data-intensive), I want to avoid copying as much as possible. I need to
> make one copy to read data from rows returned from the server and generate
> the mlton representation, and I'd like to limit it to that--but I want to
> be manipulating strings in my program, not character arrays. According to
> the FFI documentation, it appears that one way to do this would be to pass
> a CharVector (=string) to the FFI, and have the C code modify it in place:
>
> let
> (* s would be allocated based on its target length *)
> val s = "_______"
> val f = _import "f" : string -> unit ;
> in
> f s;
> ... s ...
> end
>
> But is this safe? Will the mlton optimizer, knowing that strings are
> immutable, make them share space (hash consing?) or optimize subscripts on
> constant strings? I can't tell from the FFI docs.
It is not safe. The optimizer (and hash-consing gc) assumes that vectors
are in fact immutable.
> If it's not safe, is there some way to go from C->array->vector that
> doesn't do two copies? I'm already doing FFI, so I don't mind if it's not
> type-safe (but it obviously needs to be robust).
Yes, there is an unsafe array -> vector coercion, used internally, though
not exported by the Basis implementation. You would use it like:
let
val arr = CharArray.tabulate (n, #" ")
val f = _import "f": CharArray.array -> unit ;
val () = f arr
val vec = Unsafe.CharVector.fromArray arr
in
... vec ...
end
If you want to play even faster and looser, you could do
let
var arr = Unsafe.CharArray.create n
val f = _import "f": CharArray.array -> unit ;
val () = f arr
val vec = Unsafe.CharVector.fromArray arr
in
... vec ...
end
which won't even bother initializing the array (which is fine for a
character array, as it has no internal pointers). If your C function
needs for the array to be null terminated, you may need to do a tabulate
so that no extra \000 chars may appear in the array.