[MLton] MLton calling convention and closure conversion

Tue Jan 23 02:52:04 PST 2007

On Jan 23, 2007, at 3:53 AM, Stephen Weeks wrote:
> MLton does a good job of putting basic blocks with control-flow  
> edges between them in the same chunk so that interchunk jumps are  
> relatively rare.

Part of the reason why I thought it would be an improvement is that I  
still don't understand what a chunk is. How can MLton possibly know  
the branch frequency of the code before runtime?

Also, users (including me) complain about the C-codegen's compile  
time and gcc memory usage. I believe this stems from large functions?  
So, if you didn't need the switch&trampoline infrastructure, then you  
could have smaller functions and it should be faster compiling at the  
least.

Furthermore, I don't know how smart gcc is with that switch  
statement. At every trampoline call, doesn't the switch need to be  
evaluated? With so many possible entry points it can't be quick.  
Finally, calling through a function pointer blows away all the branch  
prediction of the CPU; the trampoline is an opaque wall the CPU can't  
see beyond. Certainly if interchunk jumps are very rare, then I'll  
agree this is all no big deal.

>> I was wondering, however, how deeply the flow analysis goes: can it
>> always eliminate the need for a function pointer?
>
> That depends on what you mean by "function pointer".  The output of
> the closure conversion pass in MLton does not have any function
> pointers in the sense of "address of code".  However, it does have
> variant tags on environment records that allow it to distinguish
> between different functions and branch to the right code.  The variant
> tags are analogous to function pointers, and the case expressions that
> do the branching implement the jump.
>
> In the paper, figure 6, which I think is the example that made sense
> to you, still has "case" expressions on such sum types in the target
> program.  The tags "C5", "C6", "C7", and "C8" are variant tags
> analogous to function pointers.

Yes, I understood that it expanded each of the datatype's possible  
alternatives into function calls. However, I didn't believe it would  
take this to the extreme that Henry mentioned (all lambdas) since  
that would make a huge number of call-sites which would need to be  
chosen among (heavy branching, large code-size). It seemed to me that  
MLton would (obviously, but wrong) keep the function pointer in this  
case.

I'm still wondering about the MLton calling convention, in  
particular, after Matthew said alloca() would be difficult. In the C  
codegen, I would've thought that alloca() is easy. I am guessing that  
the x86 codegen doesn't have a frame pointer or it should be easy  
there too?

He did mention that the GC needs to walk the stack, but since you  
have local variables there already, I presume there is a way to mark  
some words as pointers.

Thanks for the answers! It really helps my understanding of what  
happens between my SML and assembly.