[MLton-user] Calling into SML from C

Sat Apr 14 12:26:01 PDT 2007

>> What constitutes "lots and lots of memory" and "lots of time to be 
>> spent in GC"?
> 
> Several hundred MB for a program that does no explicit heap allocation, 
> and, like, 60% time in the GC.

I see approximately the same behavior (with 50% GC time):

fenrir:~/tmp/export fluet$ cat export.sml
val e = _export "f": (int * real * char -> char) -> unit;
val _ = e (fn (i, r, _) => #"g")

val g = _import "g": unit -> unit;
val _ = g ()

val _ = print "success\n"
fenrir:~/tmp/export fluet$ cat ffi-export.c
#include <stdio.h>
#include "export.h"

void g () {
         Char8 c;

         fprintf (stderr, "g starting\n");
         for (int i = 0; i < 1000000; i++)
           c = f (i, 17.15, 'a');
         fprintf (stderr, "g done  char = %c\n", c);
}
fenrir:~/tmp/export fluet$ mlton -export-header export.h -default-ann 
'allowFFI true' export.sml ffi-export.c
fenrir:~/tmp/export fluet$ ./export @MLton gc-summary --
g starting
g done  char = g
success
GC type		time ms	 number		  bytes	      bytes/sec
-------------	-------	-------	---------------	---------------
copying		    226	 10,417	    107,211,864	    474,388,778	
mark-compact	      0	      0	              0	              -	
minor		      0	      0	              0	              -	
total GC time: 3,729 ms (47.7%)
max pause: 0 ms
total allocated: 867,845,084 bytes
max live: 10,392 bytes
max semispace: 94,208 bytes
max stack size: 360 bytes
marked cards: 0
minor scanned: 0 bytes

While there is a lot of allocation and garbage collection going on, the 
most relevant statistic is the max live, which indicates that there was 
never more than 10K live data.  This means that there is no space leak, 
and what data is being allocated is short lived.

>> And there will almost certainly be some allocation done in calling any 
>> exported ML function.  So, I would expect that if you write a loop in 
>> C that does nothing but repeatedly call an exported ML function, then 
>> you will see memory being allocated (and subsequently GCed).  Do you 
>> have example code that demonstrates the situation you are seeing?
> 
> This is probably what I'm seeing. The program is essentially a loop 
> around an exported function that takes a bunch of integers and returns 
> unit. Is there any way to reduce this effect?

There doesn't appear to be a simple solution.  When an exported ML 
function is called via C, it is run in its own ML thread.  The 
allocation overheard you are seeing arises from some of the thread 
switching code.

I will also point out that 50% GC time for a program that is doing 
absolutely no computation isn't something to be particularly worried 
about.  A real program will spend much more time in computation than in GC.