[MLton] Multicore CPU's and MLton

Tue, 5 Jul 2005 09:26:57 -0700

I think that multi-core is a logical (even necessary) direction for
chips to go.  But I think that process-level parallelism is usually
the way to take advantage of it.  After seeing the whole discussion I
am unconvinced of the urgent need for multi-core support for MLton
(beyond what is already there :-).  As I see it, threads can be used
for expressiveness or parallelization.  MLton already has support for
the former.  Multi-core can be useful for parallelization, but, there
is a big tradeoff between inter-process parallelism and intra-process
parallelism.  It is easy to get inter-process parallelism with MLton.
Supporting intra-process parallelism entails a lot of costs in several
dimensions:

 * MLton developer time to add support
 * complexity of MLton users' programming model
 * run-time costs of executables due to contention

The complexity increase is because with our current
model/implementation, we know at which points threads switch (safe
points at the end of blocks) so we can guarantee certain invariants
and can provide a very clean highly semantics.  Once this is gone, it
is much harder to prevent the low-level memory model from leaking
through to the high level.  The performance issues are also very hard,
both for MLton developers (who must do appropriate locking in the GC
and get the memory model right) and users (who must do appropriate
locking in their programs and understand the memory model).

The trouble might be worth it, and I wouldn't mind to see work in that
direction.  But none of the applications that I've seen posted so far
(compiles, web applications, monte carlo) make much of an argument for
intra-process over inter-process.  And it seems like it will often be
a difficult argument.  One would need an application with a lot of
shared data that is unchanging (or slowly changing) to outweigh the
cost of contention.  However, once this is the case, the same argument
will often justify a multi-process solution as well.  I don't think
I'm saying anything that Wesley didn't -- just that I think his
arguments will apply to applications beyond network-driven ones.

It seems like a difficult spot to hit where the performance hit due to
intra-process is signifcantly less than the performance hit due to
inter-process.  And simplicity clearly argues in favor of
inter-process.

I also thank everyone for the discussion.  It's great to have such a
variety of experience on the list.