[MLton] Project Proposal: an MLton "port" to LLVM

Matthew Fluet fluet@cs.cornell.edu
Tue, 15 Nov 2005 23:17:33 -0500 (EST)

> 	I thought you might like to know about the LLVM
> language/project/compiler-construction-toolkit/kitchen-sink.
> (see http://www.llvm.org)

I, at least, have been aware of LLVM for a while, though I haven't studied 
it in much detail.  I note that it has just recently had a new public 

It is certainly an appealing idea, but one that is shared by a number of 
other projects (MLRISC, C--, etc.).  I'd be happy to see MLton get out of 
the native-codegen game and be able to focus on higher-level 
optimizations.  In fact, a few months ago, I undertook an experiment 
writing a C-- backend for MLton.  You can read about it here:


Another choice thread from the archive is the following, in which the 
original poster asked about native vs. C codegens:


The big issue is that there is a lot of semantic meaning that we can't 
convey to gcc through the C language.  The question for any new target is 
whether or not that semantic meaning can be conveyed through the language 

> 	- A sophisticated optimizer, including all the usual suspects plus
> a suite of very sophisticated global optimizations including some not found
> anywhere else, e.g. pointer compression on 64-bit systems. LLVM's pool
> allocation (driven by the "data structure analysis" pass described in this
> recent PhD thesis:
> ( http://llvm.cs.uiuc.edu/pubs/2005-05-04-LattnerPHDThesis.html ) can
> perform minor miracles on the performance of pointer-intensive code, even
> that coming from weakly typed source languages (C or C++).

I looked all over the LLVM website and couldn't find benchmark results 
anywhere.  A major theme of MLton is producing high-performance code, so 
I'd want to know that LLVM can really do better than gcc.

> 	- You'll get common-subexpression elimination, loop-invariant
> code motion and a host of other optimizations for free

Not necessarily.  If you look at the MLton/C-- experiment thread, I 
describe the "abstract machine" that feeds into all MLton codegens.  The 
major issue is that to C or C-- (and presumably to LLVM) most of the code 
looks to be manipulating heap data, since MLton allocates ML stacks on the 
heap to support very deep call stacks, to support multiple ML stacks for 
light-weight concurrency, and to support accurate garbage collection.  So, 
most of the optimizations you mention above won't be effective at this 
level, since it is generally unsound to rearrange (what would to LLVM 
appear to be) arbitrary heap reads and writes.

> 	Best of all, I don't think it would be a lot of work to get MLton
> emitting LLVM. From a quick glance at the code, it seems you could write a
> functor "ToLLVM", like "ToMachine", that takes RSSA and spits out
> the appropriate LLVM.

You really need to write a new codegen pass, translating from Machine. 
If you don't go through Machine, then you would need to rewrite much of 
the runtime system (i.e., garbage collector), since the Machine IL is 
computing all the info needed there.  It probably does make sense to tweak 
some of the translation into Machine depending on the final codegen.  For 
example, we could keep things looking more SSA-ish, which would probably 
be a benefit for something like LLVM that wants SSA form.

> 	Would anyone be interested in this? I'm more than happy to do it,
> but other work on my plate at the moment means that I might not be able to
> give it a proper shot until early next year. I do use LLVM daily though as
> part of my work (I'm an MSc student, and MLton counts as "fun", alas!) so
> would be delighted (and able!) to work with any interested MLton hackers in
> bringing it up on LLVM at any time, presumably fixing any bugs/missing
> features in LLVM found along the way.

We're always interested in getting more people involved with MLton 
development.  We'd certainly encourage the experiment, though (speaking 
for myself), I'm not sure how much effort we could devote.  We're very 
happy to answer questions.