[MLton] interrupted system call

Stephen Weeks MLton@mlton.org
Fri, 26 Mar 2004 10:47:18 -0800

> So, I think the right thing to do is
> val Posix.Error.syscall: (unit -> int) -> int =
> Essentially, we want to make progress.  The system call returning without
> error is progress.  Raising the SysError exception is progress.  If we are
> outside a critical region, then handling the signal is progress, and we
> don't want to restart the system call with signals blocked, because we
> might return to this thread long after the system call was interrupted.
> If we are inside a critical region, then we need the system call to
> complete to make progress, so we block signals and restart the system
> call.  Since we are inside a critical region (and not executing any user
> code), the set of signals blocked can't be changed until the dynamic wind
> returns, restoring the signals.

Excellent analysis.  I now see what you mean about the problems that
can happen if we're not in a critical section.  The code looks great
too.  I thought about rewriting it to something like

val Posix.Error.syscall: (unit -> int) -> int =
   fn f =>
      fun call (): int =
	    val n = f ()
	    if n <> ~1
	       then n
		  val e = getErrno ()
		  if !Signal.restart andalso (e = intr orelse e = restart)
			if 0 = canHandle ()
			   then call ()
			      val m = Signal.Mask.getBlocked ()
			      val () = Signal.Mask.block Signal.Mask.all
			      dynamicWind (call,
					   fn () => Signal.Mask.setBlocked m)
		  else raiseSys e
      call ()

But I'm not so sure this is clearer.  We know that the call in the
dynamic wind can not recur since signals are blocked.  But your code
makes this clearer.  I'd leave it alone.

> One more revision -- we shouldn't block signals that aren't being handled
> by the ML signal handler.  For example, if the programmer installs a
> signal handler for SIGALRM but not for SIGNINT, then I still want C-C to
> kill the process.

Sure.  Instead of using Signal.Mask.all, we can use a new function
Signal.Mask.handled (), which is easily computable from within
signal.sml without any additional primitives.

A couple of side questions -- I'm not sure how to resolve your code
and analysis with a couple of your other statements.

> It also doesn't seem to capture the property that I proposed, which was
> that the signals were blocked until the signal handler got a chance to
> run.

I guess you've given up on this property by using ML to block the
signals instead of using GC_handler?

> I'd actually suggest that if the system call returns an error, then the
> critical section is left, so that the alrm signal gets a chance to run
> before retarting the system call. 

This seems contrary to the notion of critical section.  But I guess
that's what we're allowing with the Signal.restart flag.

And finally, your point about limit check insertion.

> Of course, this assumes a few things about when the limit checks get
> inserted.  Namely, that if I have a CFG like:
> L1: ...
>     atomicBegin ()
>     ...
>     IF _ THEN L2 ELSE L3
> L2: ...
>     atomicEnd ()
>     ...
>     GOTO L1
> Then I'm assuming that the limit check gets inserted at L1, and not L2
> (even though L2 breaks the loop).

(This is mostly to remind myself what's going on)

atomicEnd checks if a signal is pending and canHandle is zero, and if
so sets limit to zero.  It does not invoke the signal handler, but
relies on the next signal check to do so.  So, if the signal check is
at L1, then we are fine.  On the other hand, if the signal check is at
L2, we are in trouble, because we know that limit is not zero there.
Why, because we maintain the invariant that if we are in a critical
section, then limit is not zero.  This is accomplished by atomicBegin
which sets limit to nonzero in the rare situation that the C signal
handler is invoked after atomicBegin increments canHandle.

Signal check insertion does not pay attention to critical sections.
It simply inserts a signal check in all blocks that are loop headers.

So, it looks like we have a bug, assuming we can concoct a scenario
wher the loop header is inside a critical section but other parts of
the loop aren't.  Rather than worry about such a complex property, it
seems easier to assume it could happen.  One fix would be to have
signal check insertion pay attention to critical regions, which seems
hard.  Another fix would be your suggestion

> add a primitive that enters the runtime (and may switch threads)
> which invokes the signal handler (respecting canHandle).  I can
> imagine situations where I'd like an atomicEnd that not only left
> the critical region, but immediately invoked the signal handler if a
> signal were pending.

It seems fine to me to always have atomicEnd do this.  That is,
instead of

if (gcState.signalIsPending and 0 == gcState.canHandle)
	gcState.limit = 0;

Why not 

if (gcState.signalIsPending and 0 == gcState.canHandle)
	GC_gc (&gcState, ...);

If we do this, then we could tweak signal check insertion to know that
blocks with atomicEnd automatically break loops.