[MLton] Multicore CPU's and MLton

John Skaller skaller@users.sourceforge.net
Tue, 05 Jul 2005 01:51:42 +1000

Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

On Mon, 2005-07-04 at 12:35 +0200, Wesley W. Terpstra wrote:

> The best way to implement scalable web services is not with threads.
> What you will find much more scaleable is to use an EDSM inside
> process-level concurrency. I know Java doesn't do this, but threaded
> servers are slapped down by well implemented process-level concurrency.
> See http://state-threads.sf.net for more information.

Try Felix (http://felix.sf.net)

State Threads is fundamentally flawed, as you can see by these lines
from the description:

"The execution state is saved on the thread's stack (a contiguous chunk
of memory) which is transparently managed by the C environment."

"The context switch overhead is a cost of doing _setjmp()/_longjmp() (no
system calls are involved)."

It is well known this doesn't work.=20

For a start it cannot work in C++ due to exceptions,
and it is very unlikely to work in conjunction with any system
using an exact garbage collector, however the main problem is the=20
brain dead model of a linear stack used on Unix systems.=20
There simply isn't enough address space to allocate enough stacks,
even on a 64 bit machine! (And there are other problems,
for example there is no ISO Standard way to allocate=20
address space --you certainly can't use malloc, that
is required to actually allocate memory)

Felix uses a linked list of heap objects to represent
the procedure stack, a context switch is an order=20
of magnitude faster than setjmp/longjmp because it
can be achieved by a single pointer assignment.
It was designed for a telephony switch environment,
to handle 300-600 new threads per second, average life
three minutes, which works out to around 10-100K threads.

The Felix compiler automatically control inverts
blocking reads to yields.. unlike some other systems
there is no generic scheduler .. you have to write
your own application dependent one. What the compiler
does, fundamentally, is the control inversion.

Any bytecode interpreter can do this easily if it=20
is careful to avoid using the machine stack: the machine
stack is the enemy. See 'Stackless Python' which made a
trivial tweak to Python so it worked by continuation passing.
[Messing up Python callbacks from C, of course, because
once you enter the C realm, all is lost]

Similarly, functional languages based on continuation
passing are in a good position to provide=20
user space micro-threading.

In any case, if you want to use this kind of model
for ultra-fast context switching for enormous
numbers of threads, but still have a reasonable
functional programming system available, Felix is
worth considering -- it is an ML family language
designed to inter-operate smoothly with C and C++
both above (architecture) and below (user types).

One of my interests in MLton is that, like Felix,
it is a whole program analyser, which means they
can both work around the dirty programming model
used by ML: Felix is Algol-like (has first class
variables and lvalues) but functions can't
have side effects, ML is ISWIM (uses references)
and allows functions to have side effects -- in
both cases there is far too much laxity to make
optimisation possible without extensive analysis.

John Skaller <skaller at users dot sourceforge dot net>
Download Felix: http://felix.sf.net

Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

Version: GnuPG v1.2.5 (GNU/Linux)