[MLton] Speedup from unrolling and tail-recursion

Fri, 21 Jan 2005 22:52:55 -0800

> I understand that automatic tail-recursion is hard to do, but I
> think a scheme compiler, Rabbit (?) was able to do it with only a
> few simple rules.

I'm not sure what you mean by automatic tail recursion.  Tail
recursion, as implemented in many compilers, including MLton, isn't
too hard.  You just refrain from pushing a frame on the stack when
making a tail call, which is determined syntactically.  If you mean
some transformation that can turn a non-tail-recursive function into a
tail-recursive one, then yes, that is hard.  And no, Rabbit didn't do
it.  Rabbit simply converted the program into CPS.

> Recursion/loop unrolling isn't very hard though, and MLton already
> has inlining, so why is this not done?

MLton's SSA IL should be able to express the optimization easily.  I
don't think anyone has looked closely at doing it, but I would bet
that the difficult part is coming up with heuristics to decide where
to apply it and how much to unroll, giving speedup without too much
code bloat.  In your tail-recursive example, you picked just the right
spot, unrolled eight times, and got a factor of two speedup.  That's a
pretty tough tradeoff for a compiler to make.  The fruit is simply not
as low hanging as with inlining.  But, we would of course be happy to
see someone try it.