[MLton-user] simple, portable, asynchronous programming in SML

Fri, 21 Jul 2006 13:47:32 +0300

On Wed, Jul 19, 2006 at 03:08:14PM -0700, Stephen Weeks wrote:
> One could add sync like this, and program with non-preemptive threads
> without changing the model much.  The main problem with sync is that
> whether a function blocks or not is no longer reflected in the type
> system.  Without sync, the only equivalent of blocking is for a
> function to return an event, which is then reflected in its type.  The
> problem with blocking is that it allows other threads/handlers to run,
> and so complicates reasoning about a piece of code.  So, either one
> must trust the documentation for a function when it says it doesn't
> block, or one must assume that it does block and take into account all
> possible interactions with other threads/handlers.  Thus, I think it
> is much better to program without sync if possible.

I have some experience in designing and debugging systems which do
asynchronous programming with events, non-preemptive (cooperative)
threads, or both.  (From this point on, you're supposed to trust my
judgement about these things ;-)  Through these experiences, I've come
to believe that a good programming environment should support both
styles of programming.  I prefer non-preemptive threading for most
purposes, for reasons outlined below.

My recent experiences with these things have been in Scheme, so I
haven't given the type system area much consideration.

The kind of cooperative threading system I mean is such that programs
can be written in straightforward direct style.  Functions which start
asynchronous operations just block the calling thread (i.e. drop
control back to the event loop) and let the event loop decide what
happens next.  Exceptions, scoped resource management abstractions,
and other things used in "normal" programs can be used just like in,
well, normal programs.  Things like reading/writing files and TCP
sockets, accepting connections, timeouts, etc., are all asynchronous
and use the event loop.

In an event based system the control flow for even quite simple things
can become very difficult to follow.  Basically the code needs to be
manually chopped up into pieces from each asynchronous operation.  If
the code does a lot of asynchronous operations it rapidly becomes hard
to follow.

Since both approaches schedule the atomic chunks of code in the same
way (in some unspecified order), the programmer needs to go through
the same thought processes when figuring out if there is a possibility
for bad interleaving of execution.  The difference is that in event
based programming the synchronous chuncks are clearly defined, whereas
in non-preemptively threaded programs the chunks form implicitly
between asynchronous operations.

The cases where extra synchronization is needed are such that two or
more chunks need to be executed in sequence as a transaction so that
multiple threads do not execute the same chunks interleaved.  A good
way to handle this in both threaded and event based systems is to
enter the operations in a queue and consume and run them from the
queue one at a time.  In threaded systems a popular alternative is to
protect the transaction span with a mutex or similar; it's kind of
like an implicit/automatic operation queue.  I prefer queues, they
seems to inspire more careful thinking about how the concurrency of
the program is designed.

I recognize that this is a contentious issue, and has been discussed
over and over again without converging to a single technique which
everyone agrees is good enough for all purposes.  Therefore, I think a
programming environment should provide both ways of programming, even
within the same program.

As a sidenote, there was an entry on Lambda the Ultimate recently
related to this.  The link to the article is now broken, but here is a
working link:

  http://www.seas.upenn.edu/~lipeng/homepage/unify.html

--
Ville