[MLton] Multicore CPU's and MLton

Tue, 19 Jul 2005 13:35:12 -0700 (PDT)

Hi Stephen,
sorry for taking so long to reply to this but I don't read the list every day. 
There is just a couple of points I would like to make.

I don't disagree with your points.  The programming model of inter-processing
is certainly easier than that of intra-processing both for you guys (the
compiler developers) and for us users (the problems of understanding data
dependencies and synchronization are eliminated).

But there are applications out there for which intra-processing makes a lot of
sense and let me try to tell of some from my current domain.  I already
mentioned Monte Carlo, which is really trivial to intra-processes.  The slave
threads accumulate results under random paths, and when they are done, the
parent averages them by digging into their per thread data.  There is no
synchronization to speak off, every thread writes to its own private data and
they read common state.  None of the problems that you mention above exist in
this application.

It is of course possible to do Monte Carlo inter-processes style (and in fact
we do do that at my workplace).  But that comes with its own set of costs. 
First of all all the processes have to be initialized with the same state. On
unix this is easy to do (fork -- with its own costs) but on windows it's an
issue -- without fork you have to somehow transmit enough state to each child
process so that it can initialize its self.  This is non-trivial since the
state that the process must have for this application is large (I'll say a bit
more about this later).  And how do you transit this state?  Maybe you have to
open pipes and generate some sort of XML and pass it along.  This has to be
parsed, analyzed, the data structures constructed and so... And when the slave
processes are finished they must build XML and send it back to the parent.  And
to do all this communication you have to involve the OS and make system calls
and so on... and by now a large chunk of the advantage of parallelism is lost. 
Also there is the cost of writing and debugging all this code... code that I
never have to write in the intra-processes world.

Another application has to do with building up the shared state I talk about
above.  This typically consists of two pieces:

First, an abstract syntax tree or some intermediate language that basically
describes the function your are trying to integrate.  This is more a less a
program in some simple programming language.  This "program" has to be parsed,
semantically analysed, optimized and then "compiled".  In the intra-processes
model, we do this once and I just share it.  In the inter-processes model,
either each process has to do it again, or we can somehow communicate the
"bytecode" to each of them... more code to write...

Second, is the density function under which the integral is taken (in finance
terms the model).  Models need to be calibrated to match market prices.  To
perform such calibration people typically use a minimizer like simplex, or
Levenberg–Marquardt which will repeatedly call an error function that returns
roughly the distance of the observed prices to the prices your model currently
produces.  Valuing this function can in some cases be easily paralyzed. 
Threads are forked before minimization begins and when the minimizer asks for
the function valuation the threads do the work and then sleep.  Doing this in
the inter-processes model would involve so much overhead passing data back and
forth between processes that it wouldn't buy you anything.

What I hope the above demonstrates is that there are apps where inter-processes
model does make sense.  A programming environment in my view should not take a
position as to what model is "best" (I am not referring to your views here but
to statements made by others in this discussion).  There is no such thing.  A
programming environment should try to be as general of possible...there are way
too many applications out there and each has its own unique needs and
requirements.

Anyway I am glad that at least you don't object to seeing such functionality
some day being added to mlton.

Neophytos

--- Stephen Weeks <sweeks@sweeks.com> wrote:

> 
> I think that multi-core is a logical (even necessary) direction for
> chips to go.  But I think that process-level parallelism is usually
> the way to take advantage of it.  After seeing the whole discussion I
> am unconvinced of the urgent need for multi-core support for MLton
> (beyond what is already there :-).  As I see it, threads can be used
> for expressiveness or parallelization.  MLton already has support for
> the former.  Multi-core can be useful for parallelization, but, there
> is a big tradeoff between inter-process parallelism and intra-process
> parallelism.  It is easy to get inter-process parallelism with MLton.
> Supporting intra-process parallelism entails a lot of costs in several
> dimensions:
> 
>  * MLton developer time to add support
>  * complexity of MLton users' programming model
>  * run-time costs of executables due to contention
> 
> The complexity increase is because with our current
> model/implementation, we know at which points threads switch (safe
> points at the end of blocks) so we can guarantee certain invariants
> and can provide a very clean highly semantics.  Once this is gone, it
> is much harder to prevent the low-level memory model from leaking
> through to the high level.  The performance issues are also very hard,
> both for MLton developers (who must do appropriate locking in the GC
> and get the memory model right) and users (who must do appropriate
> locking in their programs and understand the memory model).
> 
> The trouble might be worth it, and I wouldn't mind to see work in that
> direction.  But none of the applications that I've seen posted so far
> (compiles, web applications, monte carlo) make much of an argument for
> intra-process over inter-process.  And it seems like it will often be
> a difficult argument.  One would need an application with a lot of
> shared data that is unchanging (or slowly changing) to outweigh the
> cost of contention.  However, once this is the case, the same argument
> will often justify a multi-process solution as well.  I don't think
> I'm saying anything that Wesley didn't -- just that I think his
> arguments will apply to applications beyond network-driven ones.
> 
> It seems like a difficult spot to hit where the performance hit due to
> intra-process is signifcantly less than the performance hit due to
> inter-process.  And simplicity clearly argues in favor of
> inter-process.
> 
> I also thank everyone for the discussion.  It's great to have such a
> variety of experience on the list.
> 
> _______________________________________________
> MLton mailing list
> MLton@mlton.org
> http://mlton.org/mailman/listinfo/mlton
> 

____________________________________________________
Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs