[MLton] _export and signal handlers
Matthew Fluet
fluet at tti-c.org
Tue Sep 11 20:49:27 PDT 2007
I've worked out the problem with Sean's segfaulting program. It is due to
an interaction between calling an _export-ed function and signal handlers;
in Sean's case, it is the GC signal handler installed by the Finalize
module.
To understand the bug, recall the MLton_callFromC function
(<src>/include/{amd64,c,bytecode,x86}-main.h):
void MLton_callFromC () { \
pointer jump; \
GC_state s; \
\
if (DEBUG_AMD64CODEGEN) \
fprintf (stderr, "MLton_callFromC() starting\n"); \
s = &gcState; \
s->savedThread = s->currentThread; \
s->atomicState += 3; \
/* Return to the C Handler thread. */ \
GC_switchToThread (s, s->callFromCHandlerThread, 0); \
jump = *(pointer*)(s->stackTop - GC_RETURNADDRESS_SIZE); \
MLton_jumpToSML(jump); \
GC_switchToThread (s, s->savedThread, 0); \
s->savedThread = BOGUS_OBJPTR; \
if (DEBUG_AMD64CODEGEN) \
fprintf (stderr, "MLton_callFromC() done\n"); \
return; \
} \
The C-stub of an _export-ed function uses MLton_callFromC to call back
into ML. <src>/basis-library/mlton/thread.sml installs a special thread
(the 'callFromCHandlerThread') that is the entry point for all _export-ed
functions; it then dispatches to the appropriate ML function, which is
wrapped by the elaborator to handle fetching arguments and storing
results.
When the _export-ed function finishes, the wrapper increments the global
atomicState (i.e., enters a critical section), stores the function result
in a place where the C-stub can grab it, and control transfers back to
MLton_callFromC after the 'MLton_jumpToSML(jump);' line. We now perform
the following:
GC_switchToThread (s, s->savedThread, 0); \
s->savedThread = BOGUS_OBJPTR; \
Normally, this has the effect of switching to the savedThread (which
decrements the global atomicState (i.e., leaves the critical section)),
but stays in C, returns to the C code that called the _export-ed function,
which eventually returns to ML (by returning from an _import-ed function).
The ML code resumes execution on the ML stack/thread installed by the
GC_switchToThread.
The 'bug' is that we try to run a signal handler as soon as possible. In
particular, GC_switchToThread checks whether the switch is leaving a
critical section and whether a signal is pending; if so, then
GC_switchToThread actually switches to the signal handler thread. In
Sean's code, a garbage collection can be triggered just after the
_export-ed function enters the critical section, but before it
returns to MLton_callFromC. The use of the Finalizer module has installed
a signal handler for the GC signal, so the garbage collection leaves a
pending signal. When MLton_callFromC executes
GC_switchToThread (s, s->savedThread, 0);
we actually end up switching to the signal handler thread. (There is a
second bug here. Switching to the signal handler thread stores the
interrupted thread in savedThread. However, MLton_callFromC executes
s->savedThread = BOGUS_OBJPTR;
which has the effect of discarding the interrupted thread.)
Now, we have the signal handler thread/stack set as the current
thread/stack when we return to the C code that invoked the _export-ed
function. We eventually return to ML code, by returning from an
_import-ed function. However, an _import-ed function is treated as
modifiesFrontier = true
mayGC = true
maySwitchThreads = false
readsStackTop = true
writesStackTop = true
The {modifiesFrontier,mayGC,{reads,writes}StackTop} = true attributes
are for _import-ed functions that can call _export-ed functions;
essentially, the attributes force the codegens not to cache various
runtime/gc related values across the _import-ed function call. However,
the maySwitchThreads = false attribute means that upon returning to ML,
the codegen won't make an indirect jump to the return address on the top
of the ML stack; instead, it assumes that the return address on the top of
the ML stack is the 'natural' return point of the _import-ed function.
Only a few primitives get turned into FFI calls with maySwitchThreads =
true. So, we end up executing the ML code at the 'natural' return point
of the _import-ed function with the ML stack of the signal handler thread.
Chaos (and segfaults) ensue.
While we could be even more conservative about _import-ed functions, and
give them the maySwitchThreads = true attribute, that isn't really the
right thing to do. In particular, an _import-ed function might make two
calls to _export-ed functions. If a signal arrives as the first
_export-ed function is returning, the (2nd) GC_switchToThread in
MLton_callFromC will switch to the signal handler thread (and copy the
original ML thread of control (call it Thread1) into the savedThread
field). But, then control will return to the _import-ed function C code,
which will call MLton_callFromC for the second _export-ed function call.
The (1st) GC_switchToThread in MLton_callFromC will switch to the
callFromCThread (and copy the signal handler thread of control into the
savedThread field). This is problematic because we will overwrite the
savedThread field, discarding the Thread1 thread of control. If we had
executed the ML side of the signal handler code immediately after
switching to the signal handler thread, then we would have copied out the
savedThread field (holding Thread1) into a ML value for the duration of
the signal handler, and we would eventually swith back to Thread1 at the
end of the signal handler.
The solution, then, seems to be to *not* allow a switch to the signal
handler thread when switching threads at the 2nd GC_switchToThread in
MLton_callFromC. This is easily accomplished by bumping the atomicState
for the duration of the GC_switchToThread call. When we decrement the
atomicState after the call, we need to make sure that we don't forget
about a possible pending signal. All in all, I think this means that we
can replace
GC_switchToThread (s, s->savedThread, 0); \
s->savedThread = BOGUS_OBJPTR; \
with
s->atomicState += 1; \
GC_switchToThread (s, GC_getSavedThread (s), 0); \
s->atomicState -= 1; \
if (0 == s->atomicState && s->signalsInfo.signalIsPending) \
s->limit = 0; \
We should also properly adjust the limit when incrementing the atomicState
at the beginning of MLton_callFromC if there is a pending signal. That
is, replace
s->atomicState += 3; \
with
s->atomicState += 3; \
if (s->signalsInfo.signalIsPending) \
s->limit = s->limitPlusSlop - GC_HEAP_LIMIT_SLOP; \
Interestingly, since this change only affects the include files, one
doesn't need to rebuild the compiler to incorporate the fix. Just apply
the above patches (or the corresponding one to soon appear in the SVN
repository) to the /usr/lib/mlton/include/{amd64,bytecode,c,x86}-main.h
files.
More information about the MLton
mailing list