[MLton] MLton calling convention and closure conversion
Wesley W. Terpstra
wesley at terpstra.ca
Tue Jan 23 02:52:04 PST 2007
On Jan 23, 2007, at 3:53 AM, Stephen Weeks wrote:
> MLton does a good job of putting basic blocks with control-flow
> edges between them in the same chunk so that interchunk jumps are
> relatively rare.
Part of the reason why I thought it would be an improvement is that I
still don't understand what a chunk is. How can MLton possibly know
the branch frequency of the code before runtime?
Also, users (including me) complain about the C-codegen's compile
time and gcc memory usage. I believe this stems from large functions?
So, if you didn't need the switch&trampoline infrastructure, then you
could have smaller functions and it should be faster compiling at the
least.
Furthermore, I don't know how smart gcc is with that switch
statement. At every trampoline call, doesn't the switch need to be
evaluated? With so many possible entry points it can't be quick.
Finally, calling through a function pointer blows away all the branch
prediction of the CPU; the trampoline is an opaque wall the CPU can't
see beyond. Certainly if interchunk jumps are very rare, then I'll
agree this is all no big deal.
>> I was wondering, however, how deeply the flow analysis goes: can it
>> always eliminate the need for a function pointer?
>
> That depends on what you mean by "function pointer". The output of
> the closure conversion pass in MLton does not have any function
> pointers in the sense of "address of code". However, it does have
> variant tags on environment records that allow it to distinguish
> between different functions and branch to the right code. The variant
> tags are analogous to function pointers, and the case expressions that
> do the branching implement the jump.
>
> In the paper, figure 6, which I think is the example that made sense
> to you, still has "case" expressions on such sum types in the target
> program. The tags "C5", "C6", "C7", and "C8" are variant tags
> analogous to function pointers.
Yes, I understood that it expanded each of the datatype's possible
alternatives into function calls. However, I didn't believe it would
take this to the extreme that Henry mentioned (all lambdas) since
that would make a huge number of call-sites which would need to be
chosen among (heavy branching, large code-size). It seemed to me that
MLton would (obviously, but wrong) keep the function pointer in this
case.
I'm still wondering about the MLton calling convention, in
particular, after Matthew said alloca() would be difficult. In the C
codegen, I would've thought that alloca() is easy. I am guessing that
the x86 codegen doesn't have a frame pointer or it should be easy
there too?
He did mention that the GC needs to walk the stack, but since you
have local variables there already, I presume there is a way to mark
some words as pointers.
Thanks for the answers! It really helps my understanding of what
happens between my SML and assembly.
More information about the MLton
mailing list