[MLton] MLton calling convention and closure conversion
Wesley W. Terpstra
terpstra at gkec.informatik.tu-darmstadt.de
Mon Jan 22 16:08:15 PST 2007
I wanted to make sure I understand these issues before I spread
misinformation:
As I understand it, gcc doesn't do tail recursion very well, because
it is the caller's responsibility to pop the stack; the callee does
not know the number of arguments passed to the function. This is
different in the code output by MLton since tail recursion is
required by SML. I am fairly certain that the MLton stack for ML
threads is heap allocated from when I looked at that code last.
From what I've seen of the C codegen, MLton uses goto (or switch)
within some sort of 'chunk' files (I still don't understand what
these are) and when you need to branch between these files, you
return the function pointer of the destination from the current
function. This then gets processed by a 'trampoline' that invokes the
returned pointer. So, the ML stack lives on the heap and the C stack
is never more than one-level deep due to the trampoline. When a C
call is needed, this is simply invoked on the C stack.
However, recent progress in gcc now allows functions with the same
number and type of arguments to be invoked tail-recursively.
Therefore, isn't this whole goto/switch/trampoline infrastructure
unnecessary and possibly slow? Since MLton is tied to gcc for its
assembler anyways, why not use these extensions?
I am guessing that the x86 codegen does something smarter.
Presumably, the x86 codegen uses its own calling convention (what is
it?) that supports tail recursion. This is easy(?) since MLton knows
the number of arguments to a function. The C stack doesn't even need
to be kept in a register within SML code. When the x86 codegen needs
to call C, it uses an alternate stack (probably the original stack
that the program was started with to appease glibc's obsession with
thread-specific values at the bottom of the stack?) and manually
pushes the necessary arguments, calls the method, and then pops those
arguments.
Finally, I tried to read your paper on closure conversion in MLton. I
didn't understand any of the type equations or transformation rules
as it uses notation I've never seen (I've never studied compiler
theory). However, the example SML code with the freshly introduced
datatypes made sense to me. I was wondering, however, how deeply the
flow analysis goes: can it always eliminate the need for a function
pointer? (for example if I had an event heap with a callback function
for each event, where many different events are registered all over
the program)
Finally, it seems to me that the functor and polymorphism elimination
via code replication and MLton's closure conversion are probably the
main reasons that MLton is so fast(?). F# and SML.NET can not
benefit from these tricks, because they must output CLR object code
that is linked by microsoft's linker. That CLR includes polymorphism
and closures in unconverted form. These optimizations would have to
be performed at link time when all the flow&use information is
available. As MS's linker does not do this, F# and SML.NET are slow.
In conclusion: I recently had to write a large piece of software in C+
+ for the first time since I've learned SML. It was horrible! I've
become addicted to the expectation that I can write in a high-level
language and expect good performance. I only did it as I needed 64GB
of RAM. Please hurry with 64 bit support and multi-threading. :-)
More information about the MLton
mailing list