[MLton] Project Proposal: an MLton "port" to LLVM

Tue, 15 Nov 2005 17:08:13 +0900

Dear MLton developers,

	I thought you might like to know about the LLVM
language/project/compiler-construction-toolkit/kitchen-sink.
(see http://www.llvm.org)

LLVM comprises the following:

	- A virtual machine architecture, LLVA. This is more or less like
any RISC CPU - loads/stores, arithmetic, comparions and branching, and
though you do have an infinite number of registers, the code must be in SSA
form. Importantly, LLVA is quite strongly typed: there are the usual
signed/unsigned 8, 16, 32 and 64 bit integers, 32 and 64 bit floating point
values, a boolean type, a void type, and a branch label type. Beyond that
are pointers to these, an n-dimensional array type, function types,
structures and packed (SIMD) types. You can take a look at the instructon
set here:

	http://llvm.cs.uiuc.edu/docs/LangRef.html

	..again it's mostly what you'd expect, though there is support for
tail calls.

	- A sophisticated optimizer, including all the usual suspects plus
a suite of very sophisticated global optimizations including some not found
anywhere else, e.g. pointer compression on 64-bit systems. LLVM's pool
allocation (driven by the "data structure analysis" pass described in this
recent PhD thesis:

( http://llvm.cs.uiuc.edu/pubs/2005-05-04-LattnerPHDThesis.html ) can
perform minor miracles on the performance of pointer-intensive code, even
that coming from weakly typed source languages (C or C++).

	- Code generators for x86, PowerPC, IA-64, Alpha and SPARC (as well
as a C emitter.) X86, PPC and SPARC also have JIT codegenerators: IA-64 and
Alpha will gain them soon. x86-64 is not yet supported, but that will change
soon. SSE and Altivec support are well on their way - a number of LLVM
developers work at Apple, who have an interest in seeing LLVM support x86
and PPC chips completely.

	- The usual tools you would expect: an assembler/disassembler,
offline optimizer, linker, static compiler, interpreter (JIT accelerated on
those platforms that support it), and also a C/C++-to-LLVM compiler (a
separate, mutant version of GCC that compiles to LLVM bytecode.)

	LLVM is highly portable, open source under a very liberal BSD-like
"do what you like with it", similar to MLton's.

	In short, LLVM is the best thing since sliced bread. So why am I
ranting about that here?

	Well, having seen MLton's "Projects" page, it occured to me that
a few birds could be killed with the one stone. In particular:

	- No need to use MLRISC (you won't get the same quality code, and
you already have substantial amounts of C)
	- You'll get common-subexpression elimination, loop-invariant
code motion and a host of other optimizations for free
	- You'll get auto-vectorization in the near future
	- You'll instantly get support for new platforms, e.g. native code
for PowerPC, Itanium, Alpha and x86-64 (the last of these is "coming soon").

	Best of all, I don't think it would be a lot of work to get MLton
emitting LLVM. From a quick glance at the code, it seems you could write a
functor "ToLLVM", like "ToMachine", that takes RSSA and spits out
the appropriate LLVM.

	Would anyone be interested in this? I'm more than happy to do it,
but other work on my plate at the moment means that I might not be able to
give it a proper shot until early next year. I do use LLVM daily though as
part of my work (I'm an MSc student, and MLton counts as "fun", alas!) so
would be delighted (and able!) to work with any interested MLton hackers in
bringing it up on LLVM at any time, presumably fixing any bugs/missing
features in LLVM found along the way.

	Well, that's my rant - I'd be happy to hear your thoughts on this
proposal!

	All the best,
	Duraid