[MLton] _export and signal handlers

Tue Sep 11 20:49:27 PDT 2007

I've worked out the problem with Sean's segfaulting program.  It is due to 
an interaction between calling an _export-ed function and signal handlers; 
in Sean's case, it is the GC signal handler installed by the Finalize 
module.

To understand the bug, recall the MLton_callFromC function 
(<src>/include/{amd64,c,bytecode,x86}-main.h):

void MLton_callFromC () {                                               \
         pointer jump;                                                   \
         GC_state s;                                                     \
                                                                         \
         if (DEBUG_AMD64CODEGEN)                                         \
                 fprintf (stderr, "MLton_callFromC() starting\n");       \
         s = &gcState;                                                   \
         s->savedThread = s->currentThread;                              \
         s->atomicState += 3;                                            \
         /* Return to the C Handler thread. */                           \
         GC_switchToThread (s, s->callFromCHandlerThread, 0);            \
         jump = *(pointer*)(s->stackTop - GC_RETURNADDRESS_SIZE);        \
         MLton_jumpToSML(jump);                                          \
         GC_switchToThread (s, s->savedThread, 0);                       \
         s->savedThread = BOGUS_OBJPTR;                                  \
         if (DEBUG_AMD64CODEGEN)                                         \
                 fprintf (stderr, "MLton_callFromC() done\n");           \
         return;                                                         \
}                                                                       \

The C-stub of an _export-ed function uses MLton_callFromC to call back 
into ML.  <src>/basis-library/mlton/thread.sml installs a special thread 
(the 'callFromCHandlerThread') that is the entry point for all _export-ed 
functions; it then dispatches to the appropriate ML function, which is 
wrapped by the elaborator to handle fetching arguments and storing 
results.

When the _export-ed function finishes, the wrapper increments the global 
atomicState (i.e., enters a critical section), stores the function result 
in a place where the C-stub can grab it, and control transfers back to 
MLton_callFromC after the 'MLton_jumpToSML(jump);' line.  We now perform 
the following:
         GC_switchToThread (s, s->savedThread, 0);                       \
         s->savedThread = BOGUS_OBJPTR;                                  \
Normally, this has the effect of switching to the savedThread (which 
decrements the global atomicState (i.e., leaves the critical section)), 
but stays in C, returns to the C code that called the _export-ed function, 
which eventually returns to ML (by returning from an _import-ed function). 
The ML code resumes execution on the ML stack/thread installed by the 
GC_switchToThread.

The 'bug' is that we try to run a signal handler as soon as possible.  In 
particular, GC_switchToThread checks whether the switch is leaving a 
critical section and whether a signal is pending; if so, then 
GC_switchToThread actually switches to the signal handler thread.  In 
Sean's code, a garbage collection can be triggered just after the 
_export-ed function enters the critical section, but before it 
returns to MLton_callFromC.  The use of the Finalizer module has installed 
a signal handler for the GC signal, so the garbage collection leaves a 
pending signal.  When MLton_callFromC executes
   GC_switchToThread (s, s->savedThread, 0);
we actually end up switching to the signal handler thread.  (There is a 
second bug here.  Switching to the signal handler thread stores the 
interrupted thread in savedThread.  However, MLton_callFromC executes
   s->savedThread = BOGUS_OBJPTR;
which has the effect of discarding the interrupted thread.)

Now, we have the signal handler thread/stack set as the current 
thread/stack when we return to the C code that invoked the _export-ed 
function.  We eventually return to ML code, by returning from an 
_import-ed function.  However, an _import-ed function is treated as
   modifiesFrontier = true
   mayGC = true
   maySwitchThreads = false
   readsStackTop = true
   writesStackTop = true
The  {modifiesFrontier,mayGC,{reads,writes}StackTop} = true  attributes 
are for _import-ed functions that can call _export-ed functions; 
essentially, the attributes force the codegens not to cache various 
runtime/gc related values across the _import-ed function call.  However, 
the  maySwitchThreads = false  attribute means that upon returning to ML, 
the codegen won't make an indirect jump to the return address on the top 
of the ML stack; instead, it assumes that the return address on the top of 
the ML stack is the 'natural' return point of the _import-ed function. 
Only a few primitives get turned into FFI calls with  maySwitchThreads = 
true.  So, we end up executing the ML code at the 'natural' return point 
of the _import-ed function with the ML stack of the signal handler thread. 
Chaos (and segfaults) ensue.

While we could be even more conservative about _import-ed functions, and 
give them the maySwitchThreads = true attribute, that isn't really the 
right thing to do.  In particular, an _import-ed function might make two 
calls to _export-ed functions.  If a signal arrives as the first 
_export-ed function is returning, the (2nd) GC_switchToThread in 
MLton_callFromC will switch to the signal handler thread (and copy the 
original ML thread of control (call it Thread1) into the savedThread 
field).  But, then control will return to the _import-ed function C code, 
which will call MLton_callFromC for the second _export-ed function call. 
The (1st) GC_switchToThread in MLton_callFromC will switch to the 
callFromCThread (and copy the signal handler thread of control into the 
savedThread field).  This is problematic because we will overwrite the 
savedThread field, discarding the Thread1 thread of control.  If we had 
executed the ML side of the signal handler code immediately after 
switching to the signal handler thread, then we would have copied out the 
savedThread field (holding Thread1) into a ML value for the duration of 
the signal handler, and we would eventually swith back to Thread1 at the 
end of the signal handler.

The solution, then, seems to be to *not* allow a switch to the signal 
handler thread when switching threads at the 2nd GC_switchToThread in 
MLton_callFromC.  This is easily accomplished by bumping the atomicState 
for the duration of the GC_switchToThread call.  When we decrement the 
atomicState after the call, we need to make sure that we don't forget 
about a possible pending signal.  All in all, I think this means that we 
can replace
         GC_switchToThread (s, s->savedThread, 0);                       \
         s->savedThread = BOGUS_OBJPTR;                                  \
with
         s->atomicState += 1;                                            \
         GC_switchToThread (s, GC_getSavedThread (s), 0);                \
         s->atomicState -= 1;                                            \
         if (0 == s->atomicState && s->signalsInfo.signalIsPending)      \
                 s->limit = 0;                                           \

We should also properly adjust the limit when incrementing the atomicState 
at the beginning of MLton_callFromC if there is a pending signal.  That 
is, replace
         s->atomicState += 3;                                            \
with
         s->atomicState += 3;                                            \
         if (s->signalsInfo.signalIsPending)                             \
                 s->limit = s->limitPlusSlop - GC_HEAP_LIMIT_SLOP;       \

Interestingly, since this change only affects the include files, one 
doesn't need to rebuild the compiler to incorporate the fix.  Just apply 
the above patches (or the corresponding one to soon appear in the SVN 
repository) to the  /usr/lib/mlton/include/{amd64,bytecode,c,x86}-main.h 
files.