[MLton-user] Why is recursion slower than C?

Wed Apr 25 08:44:40 PDT 2007

Intel hardware (probably AMD as well) has a return address stack as
part of the branch prediction logic.  This stack stores the most
recent n (for some n) addresses pushed by call instructions.  At a ret
instruction the top of this stack is used to predict where the return
will return to (and the stack is popped).  Not using these
instructions means not using this part of the branch prediction logic
and generally results in very poor branch prediction for returns, and
thus much stalling.  I suspect that this accounts for most of the
performance difference you are experiencing.  It is possible that
stack limit checks and overflow checks are part of it, but the code
looks pretty tight to me.

Neal