[MLton] Parallel compilation?
Matthew Fluet
fluet at tti-c.org
Thu Sep 20 20:33:16 PDT 2007
On Wed, 19 Sep 2007, Ryan Newton wrote:
> I haven't personally come across any multi-threaded compilers that
> parallelize at the granularity of the abstract syntax tree itself (not at a
> module or file granularity).
>
> For whole-program compilers, in the age of the multi-core menace, this seems
> relevant. I myself work on a whole program compiler that uses MLton as one
> of its backends, and have been trying to effectively parallelize it. I'd be
> happy to share more about what I'm trying to do. But first I wanted to ask
> if anyone had any pointers to work on multithreading compilers, or if any of
> the MLton developers have considered it.
I don't know of any work on multithreading compilers. There was a post a
couple of years ago that asked about multithreading MLton:
http://mlton.org/pipermail/mlton/2005-July/027229.html
The thread wanders quite a bit, and the short answer was that maybe the
native codegen could be effectively (and easily) parallelized, but little
else.
I'm not quite sure I believe that conclusion anymore, though. There are
quite a few optimization passes in MLton that map over the set of
functions (in the IL program). Some of these maps are really independent,
and could be performed in parallel. Some of these maps are not
independent, but technology like Software Transactional Memory gives me
some hope that the dependencies could be handled without trying to
revise things so they are completely independent.
I think there would still be some serious engineering concerns. When
working with a whole-program compiler, you've got the whole-program in
memory. A pass that maps over IL functions and transforms them one at a
time has nice memory usage -- at any time, only one IL function has an
extra copy kept live. If you map over the IL functions in parallel,
you've increased the memory usage -- you've essentially got two copies of
the program live at one time. The question is whether the extra GC time
(whether or not it is a concurrent gc) implied by the extra memory usage
negatively impacts whatever gain one might get from the parallelism.
In any case, it is certainly an interesting question, without any
immediate, clear answers.
More information about the MLton
mailing list