[rolsson@cs.chalmers.se: Interfacing with C]
Suresh Jagannathan
suresh@research.nj.nec.com
Tue, 6 Apr 1999 09:42:17 -0400
X-Authentication-Warning: muppet1.cs.chalmers.se: rolsson owned process doing -bs
Date: Tue, 6 Apr 1999 15:01:23 +0200 (MET DST)
From: Roland Olsson <rolsson@cs.chalmers.se>
To: MLton@research.nj.nec.com
Subject: Interfacing with C
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Most ML and Haskell compilers contain C interfaces, but I couldn't find
any instructions for calling C code in "MLton User's Guide".
Is there some documentation on how to add C code, for example by placing
it in src/runtime/mlton-lib.c?
Otherwise, this is a highly interesting and useful compiler.
Cheers,
Roland Olsson
Associate Prof.
Since MLton compiles to C, it is fairly easy to make C calls from
MLton, at least when passing and returning simple types like char,
int, and double. For example, the implementation of the posix basis
library modules is done this way (see
basis-library/posix/primitive.sml). There isn't yet any documentation
on how to call C, so I'll give a first try here. Please keep in mind
that what I am about to describe is not part of Standard ML and was
really just intended to be used to implement MLton's basis library.
It is quite possible that this interface will change, although it has
been stable for over 6 months.
Suppose you would like to call a C function with the following
prototype from ML:
int foo(double d, unsigned char c);
MLton extends the syntax of SML to allow expressions like the following:
_prim "foo": real * char -> int;
This expression returns a function of type real * char -> int whose
behavior is implemented by calling the C function whose name is foo.
What will actually happen in the C code is that there will be a C
variable d of type double, c of type unsigned char, and i of type int,
and the following C statement will be emitted.
i = foo(d, c);
The general form of a _prim declaration is
_prim "c function name": ty;
The semicolon is not optional. Here is a grammar for the types that
are currently allowed.
ty ::= u | t * ... * t -> u
t ::= u | u array | u ref | u vector
u ::= bool | char | int | real | string | unit | word | word8
Here is the mapping between SML types and C types.
SML type C type
---------- ----------
bool int (0 is false, nonzero is true)
char unsigned char
int int
real double
string char *
unit void
word unsigned int
word8 unsigned char
u array char *
u ref char *
u vector char *
Passing or returning tuples or datatypes is not allowed because the
representation of these is decided quite late in the compilation
process and because many optimizations may cause the representation to
change.
Arrays, refs, and vectors can only be passed as arguments because C
functions are not allowed to allocate in the SML heap. Although the C
type of an array, ref, or vector is always char*, in reality, the
object is layed out in the natural C representation. You are
responsible for doing the cast if you want to keep the C compiler from
complaining. I'll give an example below. The runtime system also
provides the macro GC_arrayNumElements which will return the number of
elements in an array or vector.
Strings are just like char arrays, and are not null terminated, unless
you manually do so from the ML side.
One other use of the _prim facility is to use C constants. For
example, in basis-library/posix/primitive.sml, you will see the
following lines.
type syserror = int
val acces = _prim "EACCES": syserror;
This defines the SML variable acces to be an int whose value is the
value of the C variable (macro) EACCESS. In the generated C code, all
uses of acces will be replaced by EACCESS. You must be careful with
such a constant declaration, because the optimizer assumes that it
really is a constant. If you have a C macro whose value is
non-constant, then you should declare a C macro of no arguments as a
wrapper and declare the SML function as a nullary function.
Now for the messiness, which arises because MLton doesn't know what
header files to include and what to link with in order to handle the
new C functions. Right now, this cannot be done automatically.
Instead, you must use the -C option to cause MLton to generate the C
file and then you must explicitly modify the C to include the headers.
You must also call gcc yourself and pass along the right libraries to
link with. You can use MLton -v to find out how MLton calls gcc.
By adding just a couple of command-line args to MLton, it should be
quite easy to fix this messiness. One command-line arg would specify
additional include files, and the other would specify additional files
for linking.
Hopefully, the following example makes use of the ffi clear. Suppose
you have the following files.
--------------------------------------------------------------------------------
/* ffi.h */
/* This is the wrapper around ffi that is to be called from MLton. It
* does the casts to make MLton arrays and refs look like C arrays and
* pointers.
* The SML type of FFI is real array * int ref * int -> char.
*/
#define FFI(ds,r,i) ffi((double*)ds, (int*)r, i)
/* ffi is a silly function. It sums the elements ds, stores the
* result in p, adds i to each element of ds, and returns 'c'.
*/
unsigned char ffi(double *ds, int *p, int i);
--------------------------------------------------------------------------------
/* ffi.c */
#include "mlton.h"
#include "ffi.h"
unsigned char ffi(double *ds, int *p, int n) {
int i;
double sum;
sum = 0.0;
for (i = 0; i < GC_arrayNumElements(ds); ++i) {
sum += ds[i];
ds[i] += n;
}
*p = (int)sum;
return 'c';
}
--------------------------------------------------------------------------------
(* ffi.sml *)
(* Declare ffi to be implemented by calling the C macro (or function) FFI. *)
val ffi = _prim "FFI": real array * int ref * int -> char;
open Array
val size = 10
val a = tabulate(10, fn i => real i)
val r = ref 0
val n = 17
(* Call the C function *)
val c = ffi(a, r, n)
val _ =
print(if c = #"c" andalso !r = 45
then "success\n"
else "fail\n")
--------------------------------------------------------------------------------
You can now compile the files as follows.
% gcc -c ffi.c # you may also need a -I line to specify the location of mlton.h
% mlton -o ffi.mlton.c -C ffi.sml
You must now edit fft.mlton.c to add the line #include "ffi.h" after
the line #include "mlton.h".
You can now compile the C file. First, to find out how MLton compiles
a C file, do the following.
% mlton -v ffi.mlton.c
This will fail with an undefined reference to ffi. However, you can
now cut and paste the gcc line to do what you want.
% gcc -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -I/home/sweeks/mlton/include -o /home/sweeks/mlton/c-tests/ffi.mlton ffi.mlton.c /home/sweeks/mlton/lib/libmlton.a /home/sweeks/mlton/lib/libgmp.a -lm ffi.o
You can now test the resulting executable
% ffi.mlton
success
As I said, this cutting and pasting of C files and gcc calls is all
quite messy and doesn't really need to be, since a couple of extra
command-line arguments to MLton could fix it. I'll put them in if
there's interest, but this approach is at least usable as a stopgap.