[rolsson@cs.chalmers.se: Interfacing with C]

Suresh Jagannathan suresh@research.nj.nec.com
Tue, 6 Apr 1999 09:42:17 -0400


     X-Authentication-Warning: muppet1.cs.chalmers.se: rolsson owned process doing -bs
     Date: Tue, 6 Apr 1999 15:01:23 +0200 (MET DST)
     From: Roland Olsson <rolsson@cs.chalmers.se>
     To: MLton@research.nj.nec.com
     Subject: Interfacing with C
     MIME-Version: 1.0
     Content-Type: TEXT/PLAIN; charset=US-ASCII


     Most ML and Haskell compilers contain C interfaces, but I couldn't find
     any instructions for calling C code in "MLton User's Guide".

     Is there some documentation on how to add C code, for example by placing
     it in src/runtime/mlton-lib.c?

     Otherwise, this is a highly interesting and useful compiler.

     Cheers,

     Roland Olsson
     Associate Prof.


Since MLton compiles to C, it is fairly easy to make C calls from
MLton, at least when passing and returning simple types like char,
int, and double.  For example, the implementation of the posix basis
library modules is done this way (see
basis-library/posix/primitive.sml).  There isn't yet any documentation
on how to call C, so I'll give a first try here.  Please keep in mind
that what I am about to describe is not part of Standard ML and was
really just intended to be used to implement MLton's basis library.
It is quite possible that this interface will change, although it has
been stable for over 6 months.

Suppose you would like to call a C function with the following
prototype from ML:

int foo(double d, unsigned char c);

MLton extends the syntax of SML to allow expressions like the following:

_prim "foo": real * char -> int;

This expression returns a function of type real * char -> int whose
behavior is implemented by calling the C function whose name is foo.
What will actually happen in the C code is that there will be a C
variable d of type double, c of type unsigned char, and i of type int,
and the following C statement will be emitted.

i = foo(d, c);

The general form of a _prim declaration is

_prim "c function name": ty;

The semicolon is not optional.  Here is a grammar for the types that
are currently allowed.
	ty ::= u | t * ... * t -> u
	t ::= u | u array | u ref | u vector
	u ::= bool | char | int | real | string | unit | word | word8

Here is the mapping between SML types and C types.

SML type	C type
----------	----------
bool		int (0 is false, nonzero is true)
char		unsigned char
int		int
real		double
string		char *
unit		void
word		unsigned int
word8		unsigned char
u array		char *
u ref		char *
u vector	char *

Passing or returning tuples or datatypes is not allowed because the
representation of these is decided quite late in the compilation
process and because many optimizations may cause the representation to
change.

Arrays, refs, and vectors can only be passed as arguments because C
functions are not allowed to allocate in the SML heap.  Although the C
type of an array, ref, or vector is always char*, in reality, the
object is layed out in the natural C representation.  You are
responsible for doing the cast if you want to keep the C compiler from
complaining.  I'll give an example below.  The runtime system also
provides the macro GC_arrayNumElements which will return the number of
elements in an array or vector.

Strings are just like char arrays, and are not null terminated, unless
you manually do so from the ML side.

One other use of the _prim facility is to use C constants.  For
example, in basis-library/posix/primitive.sml, you will see the
following lines.

type syserror = int
val acces = _prim "EACCES": syserror;

This defines the SML variable acces to be an int whose value is the
value of the C variable (macro) EACCESS.  In the generated C code, all
uses of acces will be replaced by EACCESS.  You must be careful with
such a constant declaration, because the optimizer assumes that it
really is a constant.  If you have a C macro whose value is
non-constant, then you should declare a C macro of no arguments as a
wrapper and declare the SML function as a nullary function.

Now for the messiness, which arises because MLton doesn't know what
header files to include and what to link with in order to handle the
new C functions.  Right now, this cannot be done automatically.
Instead, you must use the -C option to cause MLton to generate the C
file and then you must explicitly modify the C to include the headers.
You must also call gcc yourself and pass along the right libraries to
link with.  You can use MLton -v to find out how MLton calls gcc.

By adding just a couple of command-line args to MLton, it should be
quite easy to fix this messiness.  One command-line arg would specify
additional include files, and the other would specify additional files
for linking.

Hopefully, the following example makes use of the ffi clear.  Suppose
you have the following files.

--------------------------------------------------------------------------------
/* ffi.h */
/* This is the wrapper around ffi that is to be called from MLton.  It
 * does the casts to make MLton arrays and refs look like C arrays and
 * pointers.
 * The SML type of FFI is real array * int ref * int -> char.
 */
#define FFI(ds,r,i) ffi((double*)ds, (int*)r, i)

/* ffi is a silly function.  It sums the elements ds, stores the
 * result in p, adds i to each element of ds, and returns 'c'.
 */
unsigned char ffi(double *ds, int *p, int i);
--------------------------------------------------------------------------------
/* ffi.c */
#include "mlton.h"
#include "ffi.h"

unsigned char ffi(double *ds, int *p, int n) {
	int i;
	double sum;

	sum = 0.0;
	for (i = 0; i < GC_arrayNumElements(ds); ++i) {
		sum += ds[i];
		ds[i] += n;
	}
	*p = (int)sum;
	return 'c';
}
--------------------------------------------------------------------------------
(* ffi.sml *)

(* Declare ffi to be implemented by calling the C macro (or function) FFI. *)
val ffi = _prim "FFI": real array * int ref * int -> char;
open Array

val size = 10
val a = tabulate(10, fn i => real i)
val r = ref 0
val n = 17

(* Call the C function *)
val c = ffi(a, r, n)

val _ =
   print(if c = #"c" andalso !r = 45
	    then "success\n"
	 else "fail\n")
--------------------------------------------------------------------------------

You can now compile the files as follows.

% gcc -c ffi.c   # you may also need a -I line to specify the location of mlton.h
% mlton -o ffi.mlton.c -C ffi.sml

You must now edit fft.mlton.c to add the line #include "ffi.h" after
the line #include "mlton.h".

You can now compile the C file.  First, to find out how MLton compiles
a C file, do the following.

% mlton -v ffi.mlton.c

This will fail with an undefined reference to ffi.  However, you can
now cut and paste the gcc line to do what you want.

% gcc   -O1 -w -fomit-frame-pointer -fno-strength-reduce -m486 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -I/home/sweeks/mlton/include -o /home/sweeks/mlton/c-tests/ffi.mlton ffi.mlton.c /home/sweeks/mlton/lib/libmlton.a /home/sweeks/mlton/lib/libgmp.a -lm ffi.o

You can now test the resulting executable

% ffi.mlton
success

As I said, this cutting and pasting of C files and gcc calls is all
quite messy and doesn't really need to be, since a couple of extra
command-line arguments to MLton could fix it.  I'll put them in if
there's interest, but this approach is at least usable as a stopgap.