[MLton] callcc and space use

Wed Jan 23 07:21:10 PST 2008

On Wed, 23 Jan 2008, Vesa Karvonen wrote:
> My statement above about the way continuations are implemented is
> somewhat inaccurate.  Both threads and continuations are implemented
> on top of primitive threads.

Correct.

> Both callcc and throw copy the primitive
> thread (time proportional to stack size) and then perform a primitive
> thread switch (I assume constant time).

Correct.  Both copy current (primitive) thread and copy (primitive) thread 
take time proportional to the stack size of the thread.  A (primitive) 
thread switch is constant time.

> Thread switch does not
> require copying primitive threads, except in the special case of a
> newly created thread.

A primitive thread switch does not require copying primitive threads.
In the MLton.Thread structure, the non-primitive thread switch does copy 
a primitive thread when switching to a new non-primitive.  And, as noted 
by Florian and Vesa, that copy is of a small stack captured at the 
beginning of the program, so that operation can reasonably be considered 
constant time.

> OTOH, both callcc and throw always copy the stack.  So, if you perform
> suspend+resume using callcc+throw, you'll copy the stack twice.

That is true.  Because continuations are not one-shot, you need to make a 
copy for each throw.

>> However, the function to call is passed to the nascent thread in a
>> global variable (with suitable locking), and this results in unnecessary
>> contention in a parallel environment.
>
> True.  OTOH, note that callcc/throw also use the equivalent of a
> global variable.  The call to Thread.copyCurrent saves the copy in a
> field in GC state, which is then retrieved by calling Thread.savedPre.
> Both callcc and throw also perform locking (atomicBegin - atomicEnd).

Note that atomicBegin/atomicEnd is very lightweight locking -- they 
currently simply inc/dec a global variable.  They are only suitable to 
prevent switching between ML threads within a single OS thread of control.

> In summary, assuming my understanding of the implementation is
> correct, using callcc+throw to spawn a new thread and perform N
> suspends and resumes takes O((N+1)*S) time, where S is the average
> stack size, while the same with threads takes O(N+1) time.

That sounds about right.