[MLton-user] Tracking down allocations

Tue Jul 24 11:50:42 PDT 2007

On Fri, 20 Jul 2007, Joe Hurd wrote:
> Sorry for the delay in replying: I was at a conference at the
> beginning of the week.
>
>>  I'd still be interested in seeing the source program, to see if there is a
>>  clear reason why the tuple is being eliminated in one version but not the
>>  other.
>
> I've uploaded the two versions of the program to the TemporaryUpload
> page of the MLton website, together with the results of profiling
> allocations. There are two functions that produce different results
> when manually lambda lifted: they can be found in the source by
> searching for the string MLTON DEBUGGING.

I took a look at the nested-functions.sml program.  In addition to the 
allocation Joe noted in the nested-functions.ssa file:

     Enter MonteCarlo.placeStone.<branch> src/MonteCarlo.sml: 773
     tuple_322 = (x_5086, x_6058, x_24097)

there are a bunch more which get charged to the 
MonteCarlo.placeStone.<branch> source position:

     tuple_348 = (x_6058, x_25012, x_24112, x_6060)
     tuple_347 = (x_25012, x_6060, x_6058)
     tuple_346 = (x_6058, x_6060, tuple_347)
     tuple_345 = (tuple_346, x_6060, x_6058)
     tuple_344 = (x_5085, tuple_348)
     tuple_343 = (x_6058, x_5085, tuple_322, tuple_348)
     tuple_342 = (x_6058, x_5085, tuple_322, tuple_348, x_6060)
     tuple_340 = (tuple_347, x_5085, x_6058)
     tuple_339 = (tuple_347, tuple_322, x_5085, x_6058)
     tuple_338 = (x_6058, x_5085, tuple_322, tuple_347)
     tuple_337 = (x_6058, x_5085, tuple_346)
     tuple_336 = (x_6058, x_5085, tuple_322, tuple_346)
     tuple_335 = (tuple_345, x_5085, x_6058)
     tuple_334 = (tuple_343, tuple_342)
     tuple_333 = (tuple_342, tuple_342, tuple_343)
     tuple_332 = (tuple_338, tuple_339)
     tuple_330 = (tuple_340, tuple_337)
     tuple_329 = (tuple_336, tuple_339)
     tuple_327 = (tuple_335, tuple_337, tuple_340)

These are all of the environments for the nested functions.  I believe 
that you are being hurt by MLton's flat-closure representation.  In 
particular, all of those nested functions, which make various calls to one 
another, share a very common environment -- the one that you manually 
created by lambda lifting.  You can observe this in the above by noting 
that many of those tuples share common values (the x_* variables); indeed, 
expanding tuple_345, it looks like:

   tuple_345 = ((x_6058, x_6060, (x_25012, x_6060, x_6058)), x_6060, x_6058)

Note that this duplicates x_6058 and x_6060 numerous times.

In the lambda-lifted.sml case, the sharing seems to be exposed enough for 
MLton to eliminate it.  What's interesting is that you lambda-lifted in a 
curried way.  While MLton's closure conversion would introduce tuples for 
the environments of the implicit functions in the lambda-lifted ones, it 
seems that the packing and unpacking of the environments exposes things 
enough for MLton to eliminate it.

There is an old idea of Stephen's for an improved closure representation, 
whose e-mail thread was duplicated a few years ago:
   http://mlton.org/pipermail/mlton/2003-October/024570.html
I think it is trying to pick up exactly this case, where some sharing of 
closures would be a win.

There is also the Shao/Appel closure representation analysis:
   http://mlton.org/References#ShaoAppel94

In any case, I'm afraid that there isn't an immediate fix (though 
Stephen's improved closure representation shouldn't be hard to implement). 
I like the example, though, because I observe a 1.10X improvement in moves 
per second using the lambda-lifted version, suggesting that there is some 
real improvement to be had from an improved closure representation.