[MLton] mlton bug [String.fromCString ?]
Matthew Fluet
fluet@cs.cornell.edu
Thu, 28 Jul 2005 09:04:03 -0400 (EDT)
> I strongly suspect the problem is with String.fromCString, unless
> I'm misusing ffi ?
You are misusing FFI.
val int2string = _import * : DynLink.fptr -> int -> string;
val get_name = int2string (DynLink.dlsym (hndl, "get_name"));
If you _import a function that accepts and/or returns "string", then the
function must accept/return an ML string on the ML heap. Part of that
includes having a valid header word, a length word, no need for null
termination, etc. The "get_name" function in names.c does not return an
ML string on the ML heap; rather, it returns a C-string in static data.
String.fromCString does not convert a null-terminated C string into an ML
string; it converts an ML (heap allocated) string to another ML (heap
allocated) string, obeying C-style escape sequences. However,
String.fromCString is required to convert only the maximal prefix of the
string that contains printable C-characters. Hence, it stops converting
upon encountering the null termination of the C-strings.
> The program retrieves a series of strings by calls to get_name in
> names.so, then prints them.
>
> The first two strings do not show up properly.
I am very surprised that the program terminates normally with only the two
strings not printing correctly. What appears to be happening is that the
first two strings are somehow interpreted as ML strings of length zero;
presumably, the padding in the static data section has put enough zeros in
that when interpreting the pointer as an ML heap allocated string, we find
zero where we expect to find the length. After the first two strings, we
start seeing the previous strings where we expect to find the length.
Hence, the lengths are interpreted as "big-enough" to convert the C-string
until encountering the null termination. This is certainly not robust,
and is quite sensitive to the fact that there appear to be no intervening
garbage collections -- which would be thouroughly confused by your
C-strings.
The right way to accomplish what you have in mind is the following:
val int2ptr = _import * : DynLink.fptr -> int -> MLton.Pointer.t;
val get_name = int2ptr (DynLink.dlsym (hndl, "get_name"));
fun fetchCString ptr =
let
fun loop (i, accum) =
let
val w = MLton.Pointer.getWord8 (ptr, i)
in
(* Search for explicit null termination. *)
if w = 0wx0
then String.implode (List.rev accum)
else loop (i + 1, (Byte.byteToChar w) :: accum)
end
in
loop (0, [])
end
val v = Vector.tabulate(c,fn i => fetchCString (get_name i));
> Also, funnyly, mlton rejects file names.sml on Solaris, in which
> it finds a syntax error ...
I don't have an explaination for that.