[MLton] Project Proposal: an MLton "port" to LLVM
Tue, 15 Nov 2005 23:17:33 -0500 (EST)
> I thought you might like to know about the LLVM
> (see http://www.llvm.org)
I, at least, have been aware of LLVM for a while, though I haven't studied
it in much detail. I note that it has just recently had a new public
It is certainly an appealing idea, but one that is shared by a number of
other projects (MLRISC, C--, etc.). I'd be happy to see MLton get out of
the native-codegen game and be able to focus on higher-level
optimizations. In fact, a few months ago, I undertook an experiment
writing a C-- backend for MLton. You can read about it here:
Another choice thread from the archive is the following, in which the
original poster asked about native vs. C codegens:
The big issue is that there is a lot of semantic meaning that we can't
convey to gcc through the C language. The question for any new target is
whether or not that semantic meaning can be conveyed through the language
> - A sophisticated optimizer, including all the usual suspects plus
> a suite of very sophisticated global optimizations including some not found
> anywhere else, e.g. pointer compression on 64-bit systems. LLVM's pool
> allocation (driven by the "data structure analysis" pass described in this
> recent PhD thesis:
> ( http://llvm.cs.uiuc.edu/pubs/2005-05-04-LattnerPHDThesis.html ) can
> perform minor miracles on the performance of pointer-intensive code, even
> that coming from weakly typed source languages (C or C++).
I looked all over the LLVM website and couldn't find benchmark results
anywhere. A major theme of MLton is producing high-performance code, so
I'd want to know that LLVM can really do better than gcc.
> - You'll get common-subexpression elimination, loop-invariant
> code motion and a host of other optimizations for free
Not necessarily. If you look at the MLton/C-- experiment thread, I
describe the "abstract machine" that feeds into all MLton codegens. The
major issue is that to C or C-- (and presumably to LLVM) most of the code
looks to be manipulating heap data, since MLton allocates ML stacks on the
heap to support very deep call stacks, to support multiple ML stacks for
light-weight concurrency, and to support accurate garbage collection. So,
most of the optimizations you mention above won't be effective at this
level, since it is generally unsound to rearrange (what would to LLVM
appear to be) arbitrary heap reads and writes.
> Best of all, I don't think it would be a lot of work to get MLton
> emitting LLVM. From a quick glance at the code, it seems you could write a
> functor "ToLLVM", like "ToMachine", that takes RSSA and spits out
> the appropriate LLVM.
You really need to write a new codegen pass, translating from Machine.
If you don't go through Machine, then you would need to rewrite much of
the runtime system (i.e., garbage collector), since the Machine IL is
computing all the info needed there. It probably does make sense to tweak
some of the translation into Machine depending on the final codegen. For
example, we could keep things looking more SSA-ish, which would probably
be a benefit for something like LLVM that wants SSA form.
> Would anyone be interested in this? I'm more than happy to do it,
> but other work on my plate at the moment means that I might not be able to
> give it a proper shot until early next year. I do use LLVM daily though as
> part of my work (I'm an MSc student, and MLton counts as "fun", alas!) so
> would be delighted (and able!) to work with any interested MLton hackers in
> bringing it up on LLVM at any time, presumably fixing any bugs/missing
> features in LLVM found along the way.
We're always interested in getting more people involved with MLton
development. We'd certainly encourage the experiment, though (speaking
for myself), I'm not sure how much effort we could devote. We're very
happy to answer questions.